Let us help you find the training program you are looking for.

If you can't find what you are looking for, contact us, we'll help you find it. We have over 800 training programs to choose from.

banner-img

Course Skill Level:

Foundational

Course Duration:

3 day/s

  • Course Delivery Format:

    Live, instructor-led.

  • Course Category:

    Big Data & Data Science

  • Course Code:

    RTDPAAL21E09

Who should attend & recommended skills:

Beginners with basic Python experience

Who should attend & recommended skills

  • Beginners who want a practical guide to help tackle different real-time data processing and analytics problems using the best tools for each scenario.
  • Skill-level: Foundation-level Practical Real-time Data Processing and Analytics skills for Intermediate skilled team members.
  • This is not a basic class. Python: Basic (1-2 years’ experience)

About this course

With the rise of Big Data, there is an increasing need to process large amounts of data continuously, with a shorter turnaround time. Real-time data processing involves continuous input, processing and output of data, with the condition that the time required for processing is as short as possible. This book covers the majority of the existing and evolving open-source technology stack for real-time processing and analytics. You will get to know about all the real-time solution aspects, from the source to the presentation to persistence. Through this practical book, youll be equipped with a clear understanding of how to solve challenges on your own. Well cover topics such as how to set up components, basic executions, integrations, advanced use cases, alerts, and monitoring. Youll be exposed to the popular tools used in real-time processing today such as Apache Spark, Apache Flink, and Storm. Finally, you will put your knowledge to practical use by implementing all of the techniques in the form of a practical, real-world use case. By the end of this book, you will have a solid understanding of all the aspects of real-time data processing and analytics, and will know how to deploy the solutions in production environments in the best possible manner.

Skills acquired & topics covered

  • Various challenges in real-time data processing and use the right tools to overcome them
  • Popular tools and frameworks such as Spark, Flink, and Apache Storm to solve all your distributed processing problems
  • A practical guide filled with examples, tips, and tricks to help you perform efficient Big Data processing in real-time
  • An introduction to the established real-time stack
  • The key integration of all the components
  • A thorough understanding of the basic building blocks for real-time solution designing
  • Garnishing the search and visualization aspects for your real-time solution
  • Getting conceptually and practically acquainted with real-time analytics
  • Being well equipped to apply the knowledge and create your own solutions

Course breakdown / modules

  • What is big data?
  • Big data infrastructure
  • Real–time analytics – the myth and the reality
  • Near real–time solution – an architecture that works
  • Lambda architecture – analytics possibilities
  • IOT – thoughts and possibilities
  • Cloud – considerations for NRT and IOT

  • The NRT system and its building blocks
  • NRT – high-level system view
  • NRT – technology view

  • Understanding data streams
  • Setting up infrastructure for data ingestion
  • Taping data from source to the processor – expectations and caveats
  • Comparing and choosing what works best for your use case
  • Do it yourself

  • Overview of Storm
  • Storm architecture and its components
  • Setting up and configuring Storm
  • Real-time processing job on Storm

  • Setting up and a quick execution of Spark
  • Setting up and a quick execution of Flink
  • Setting up and a quick execution of Apache Beam
  • Balancing in Apache Beam

  • RabbitMQ – messaging that works
  • RabbitMQ exchanges
  • RabbitMQ – integration with Storm
  • PubNub data stream publisher
  • String together Storm-RMQ-PubNub sensor data topology

  • Setting up and configuring Cassandra
  • Storm and Cassandra topology
  • Storm and IMDB integration for dimensional data
  • Integrating the presentation layer with Storm
  • Do It Yourself

  • State retention and the need for Trident
  • Basic Storm Trident topology
  • Trident internals
  • Trident operations
  • DRPC
  • Do It Yourself

  • Spark overview
  • Distinct advantages of Spark
  • Spark – use cases
  • Spark architecture – working inside the engine
  • Spark pragmatic concepts
  • Spark 2.x – advent of data frames and datasets

  • Working with Spark Operations
  • Spark – packaging and API
  • RDD pragmatic exploration
  • Shared variables – broadcast variables and accumulators

  • Spark Streaming – introduction and architecture
  • Packaging structure of Spark Streaming
  • Connecting Kafka to Spark Streaming

  • Flink architecture and execution engine
  • Flink basic components and processes
  • Integration of source stream to Flink
  • Flink processing and computation
  • Flink persistence
  • FlinkCEP
  • Pattern API
  • Gelly
  • DIY

  • Introduction
  • Data modeling
  • Tools and frameworks
  • Setting up the infrastructure
  • Implementing the case study
  • Running the case study