Let us help you find the training program you are looking for.

If you can't find what you are looking for, contact us, we'll help you find it. We have over 800 training programs to choose from.

Real-time Data Processing and Analytics

  • Course Code: Data Science - Real-time Data Processing and Analytics
  • Course Dates: Contact us to schedule.
  • Course Category: Big Data & Data Science Duration: 3 Days Audience: This course is geared for those who wants to Get practical guide to help you tackle different real-time data processing and analytics problems using the best tools for each scenario.

Course Snapshot 

  • Skill-level: Foundation-level Practical Real-time Data Processing and Analytics skills for Intermediate skilled team members. This is not a basic class. 
  • Targeted Audience: This course is geared for those who wants to Get practical guide to help you tackle different real-time data processing and analytics problems using the best tools for each scenario.   
  • Hands-on Learning: This course is approximately 50% hands-on lab to 50% lecture ratio, combining engaging lecture, demos, group activities and discussions with machine-based student labs and exercises. Student machines are required. 
  • Delivery Format: This course is available for onsite private classroom presentation. 
  • Customizable: This course may be tailored to target your specific training skills objectives, tools of choice and learning goals. 

With the rise of Big Data, there is an increasing need to process large amounts of data continuously, with a shorter turnaround time. Real-time data processing involves continuous input, processing and output of data, with the condition that the time required for processing is as short as possible. This book covers the majority of the existing and evolving open source technology stack for real-time processing and analytics. You will get to know about all the real-time solution aspects, from the source to the presentation to persistence. Through this practical book, you’ll be equipped with a clear understanding of how to solve challenges on your own. We’ll cover topics such as how to set up components, basic executions, integrations, advanced use cases, alerts, and monitoring. You’ll be exposed to the popular tools used in real-time processing today such as Apache Spark, Apache Flink, and Storm. Finally, you will put your knowledge to practical use by implementing all of the techniques in the form of a practical, real-world use case. By the end of this book, you will have a solid understanding of all the aspects of real-time data processing and analytics, and will know how to deploy the solutions in production environments in the best possible manner. 

Working in a hands-on learning environment, led by our Real-time Data Processing and Analytics expert instructor, students will learn about and explore: 

  • Learn about the various challenges in real-time data processing and use the right tools to overcome them 
  • This book covers popular tools and frameworks such as Spark, Flink, and Apache Storm to solve all your distributed processing problems 
  • A practical guide filled with examples, tips, and tricks to help you perform efficient Big Data processing in real-time 

Topics Covered: This is a high-level list of topics covered in this course. Please see the detailed Agenda below 

  • Get an introduction to the established real-time stack 
  • Understand the key integration of all the components 
  • Get a thorough understanding of the basic building blocks for real-time solution designing 
  • Garnish the search and visualization aspects for your real-time solution 
  • Get conceptually and practically acquainted with real-time analytics 
  • Be well equipped to apply the knowledge and create your own solutions 

Audience & Pre-Requisites 

This course is designed for for beginners who wants to get practical guide to help you tackle different real-time data processing and analytics problems using the best tools for each scenario 

Pre-Requisites:  Students should have familiar with  

  • Basics of Python  
  • Knowledge of Python is assumed. 

Course Agenda / Topics 

  1. Introducing Real-Time Analytics 
  • Introducing Real-Time Analytics 
  • What is big data? 
  • Big data infrastructure 
  • Real–time analytics – the myth and the reality 
  • Near real–time solution – an architecture that works 
  • Lambda architecture – analytics possibilities 
  • IOT – thoughts and possibilities 
  • Cloud – considerations for NRT and IOT 
  1. Real Time Applications – The Basic Ingredients 
  • Real Time Applications – The Basic Ingredients 
  • The NRT system and its building blocks 
  • NRT – high-level system view 
  • NRT – technology view 
  1. Understanding and Tailing Data Streams 
  • Understanding and Tailing Data Streams 
  • Understanding data streams 
  • Setting up infrastructure for data ingestion 
  • Taping data from source to the processor – expectations and caveats 
  • Comparing and choosing what works best for your use case 
  • Do it yourself 
  1. Setting up the Infrastructure for Storm 
  • Setting up the Infrastructure for Storm 
  • Overview of Storm 
  • Storm architecture and its components 
  • Setting up and configuring Storm 
  • Real-time processing job on Storm 
  1. Configuring Apache Spark and Flink 
  • Configuring Apache Spark and Flink 
  • Setting up and a quick execution of Spark 
  • Setting up and a quick execution of Flink 
  • Setting up and a quick execution of Apache Beam 
  • Balancing in Apache Beam 
  1. Integrating Storm with a Data Source 
  • Integrating Storm with a Data Source 
  • RabbitMQ – messaging that works 
  • RabbitMQ exchanges 
  • RabbitMQ – integration with Storm 
  • PubNub data stream publisher 
  • String together Storm-RMQ-PubNub sensor data topology 
  1. From Storm to Sink 
  • From Storm to Sink 
  • Setting up and configuring Cassandra 
  • Storm and Cassandra topology 
  • Storm and IMDB integration for dimensional data 
  • Integrating the presentation layer with Storm 
  • Do It Yourself 
  1. Storm Trident 
  • Storm Trident 
  • State retention and the need for Trident 
  • Basic Storm Trident topology 
  • Trident internals 
  • Trident operations 
  • DRPC 
  • Do It Yourself 
  1. Working with Spark 
  • Working with Spark 
  • Spark overview 
  • Distinct advantages of Spark 
  • Spark – use cases 
  • Spark architecture – working inside the engine 
  • Spark pragmatic concepts 
  • Spark 2.x – advent of data frames and datasets 
  1. Working with Spark Operations 
  • Working with Spark Operations 
  • Spark – packaging and API 
  • RDD pragmatic exploration 
  • Shared variables – broadcast variables and accumulators 
  1. Spark Streaming 
  • Spark Streaming 
  • Spark Streaming concepts 
  • Spark Streaming – introduction and architecture 
  • Packaging structure of Spark Streaming 
  • Connecting Kafka to Spark Streaming 
  1. Working with Apache Flink 
  • Working with Apache Flink 
  • Flink architecture and execution engine 
  • Flink basic components and processes 
  • Integration of source stream to Flink 
  • Flink processing and computation 
  • Flink persistence 
  • FlinkCEP 
  • Pattern API 
  • Gelly 
  • DIY 
  1. Case Study 
  • Case Study 
  • Introduction 
  • Data modeling 
  • Tools and frameworks 
  • Setting up the infrastructure 
  • Implementing the case study 
  • Running the case study 

View All Courses

    Course Inquiry

    Fill in the details below and we will get back to you as quickly as we can.

    Interested in any of these related courses?