Let us help you find the training program you are looking for.

If you can't find what you are looking for, contact us, we'll help you find it. We have over 800 training programs to choose from.

banner-img

Course Skill Level:

Foundational to Intermediate

Course Duration:

3 day/s

  • Course Delivery Format:

    Live, instructor-led.

  • Course Category:

    Big Data & Data Science

  • Course Code:

    BDSAAKL21E09

Who should attend & recommended skills:

Those with basic IT and traditional database experience

Who should attend & recommended skills

  • This course is geared for those who want to design and administer fast, reliable enterprise messaging systems with Apache Kafka.
  • Skill-level: Foundation-level Apache Spark skills for Intermediate skilled team members. This is not a basic class.
  • IT skills: Basic to Intermediate (1-5 years’ experience)
  • Traditional databases: Basic (1-2 years’ experience) helpful
  • Large-scale Data Analysis and NoSQL tools not necessary

About this course

Apache Kafka is a popular distributed streaming platform that acts as a messaging queue or an enterprise messaging system. It lets you publish and subscribe to a stream of records, and process them in a fault-tolerant way as they occur. This course is a comprehensive guide to designing and architecting enterprise-grade streaming applications using Apache Kafka and other big data tools. It includes best practices for building such applications, and tackles some common challenges such as how to use Kafka efficiently and handle high data volumes with ease. This course first takes you through understanding the type messaging system and then provides a thorough introduction to Apache Kafka and its internal details. The second part of the book takes you through designing streaming application using various frameworks and tools such as Apache Spark, Apache Storm, and more. Once you grasp the basics, we will take you through more advanced concepts in Apache Kafka such as capacity planning and security. By the end of this course, you will have all the information you need to be comfortable with using Apache Kafka, and to design efficient streaming data applications with it.

Skills acquired & topics covered

  • Building efficient real-time streaming applications in Apache Kafka to process data streams of data
  • Mastering the core Kafka APIs to set up Apache Kafka clusters and start writing message producers and consumers
  • A comprehensive guide to help you get a solid grasp of the Apache Kafka concepts in Apache Kafka with practical examples
  • Learning the basics of Apache Kafka from scratch
  • Using the basic building blocks of a streaming application
  • Designing effective streaming applications with Kafka using Spark, Storm, and Heron
  • Understanding the importance of a low-latency, high-throughput, and fault-tolerant messaging system
  • Making effective capacity planning while deploying your Kafka Application
  • Understanding and implementing the best security practices

Course breakdown / modules

  • Understanding the principles of messaging systems
  • Understanding messaging systems
  • Peeking into a point-to-point messaging system
  • Publish-subscribe messaging system
  • Advance Queuing Messaging Protocol
  • Using messaging systems in big data streaming applications

  • Kafka origins
  • Kafka's architecture
  • Message topics
  • Message partitions
  • Replication and replicated logs
  • Message producers
  • Message consumers
  • Role of Zookeeper

  • Kafka producer internals
  • Kafka Producer APIs
  • Java Kafka producer example
  • Common messaging publishing patterns
  • Best practices

  • Kafka consumer internals
  • Kafka consumer APIs
  • Java Kafka consumer
  • Scala Kafka consumer
  • Common message consuming patterns
  • Best practices

  • Introduction to Spark 
  • Spark Streaming 
  • Use case log processing – fraud IP detection
  • Producer

  • Introduction to Apache Storm
  • Introduction to Apache Heron
  • Integrating Apache Kafka with Apache Storm – Java
  • Integrating Apache Kafka with Apache Storm – Scala
  • Use case – log processing in Storm, Kafka, Hive

  • Introduction to Confluent Platform
  • Deep driving into Confluent architecture
  • Understanding Kafka Connect and Kafka Stream
  • Playing with Avro using Schema Registry
  • Moving Kafka data to HDFS

  • Considerations for using Kafka in ETL pipelines
  • Introducing Kafka Connect
  • Deep dive into Kafka Connect
  • Introductory examples of using Kafka Connect
  • Kafka Connect common use cases

  • Introduction to Kafka Streams
  • Kafka Stream architecture 
  • Integrated framework advantages
  • Understanding tables and Streams together
  • Use case example of Kafka Streams

  • Kafka cluster internals
  • Capacity planning
  • Single cluster deployment
  • Multicluster deployment
  • Decommissioning brokers
  • Data migration

  • Managing high volumes in Kafka
  • Kafka message delivery semantics
  • Big data and Kafka common usage patterns
  • Kafka and data governance
  • Alerting and monitoring
  • Useful Kafka matrices

  • An overview of securing Kafka
  • Wire encryption using SSL
  • Kerberos SASL for authentication
  • Understanding ACL and authorization
  • Understanding Zookeeper authentication
  • Apache Ranger for authorization

  • Latency and throughput
  • Data and state persistence
  • Data sources
  • External data lookups
  • Data formats
  • Data serialization
  • Level of parallelism
  • Out-of-order events
  • Message processing semantics