Let us help you find the training program you are looking for.

If you can't find what you are looking for, contact us, we'll help you find it. We have over 800 training programs to choose from.

banner-img

Course Skill Level:

Foundational to Intermediate

Course Duration:

2 day/s

  • Course Delivery Format:

    Live, instructor-led.

  • Course Category:

    Big Data & Data Science

  • Course Code:

    STOAPPL21E09

Who should attend & recommended skills:

Those with basic IT and big data experience

Who should attend & recommended skills

  • Those who want to build a solid foundation of Storm essentials and use Apache Storm for the real-world tasks associated with processing and analyzing real-time data streams.
  • Skill-level: Foundation-level Storm Applied skills for Intermediate skilled team members. This is not a basic class.
  • IT skills: Basic to Intermediate (1-5 years’ experience)
  • Big data and real-time system: Basic (1-2 years’ experience)
  • Storm: Experience not required

About this course

Storm Applied is an example-driven guide to processing and analyzing real-time data streams. This immediately useful course starts by teaching you how to design Storm solutions the right way. Then, it quickly dives into real-world case studies that show you how to scale a high-throughput stream processor, ensure smooth operation within a production cluster, and more. Along the way, you’ll learn to use Trident for stateful stream processing, along with other tools from the Storm ecosystem.

Skills acquired & topics covered

  • How to think about designing Storm solutions the right way from day one.
  • Real-world case studies that will bring the novice up to speed with productionizing Storm.
  • Mapping real problems to Storm components
  • Performance tuning and scaling
  • Practical troubleshooting and debugging
  • Exactly-once processing with Trident

Course breakdown / modules

  • What is big data?
  • How Storm fits into the big data picture
  • Why you’d want to use Storm

  • Problem definition: GitHub commit count dashboard
  • Basic Storm concepts
  • Implementing a GitHub commit count dashboard in Storm

  • Approaching topology design
  • Problem definition: a social heat map
  • Precepts for mapping the solution to Storm
  • Initial implementation of the design
  • Scaling the topology
  • Topology design paradigms

  • Requirements for reliability
  • Problem definition: a credit card authorization system
  • Basic implementation of the bolts
  • Guaranteed message processing
  • Replay semantics

  • The Storm cluster
  • Fail-fast philosophy for fault tolerance within a Storm cluster
  • Installing a Storm cluster
  • Getting your topology to run on a Storm cluster
  • The Storm UI and its role in the Storm cluster

  • Problem definition: Daily Deals! reborn
  • Initial implementation
  • Tuning: I wanna go fast
  • Latency: when external systems take their time
  • Storm’s metrics-collecting API

  • Changing the number of worker processes running on a worker node
  • Changing the amount of memory allocated to worker processes (JVMs)
  • Figuring out which worker nodes/processes a topology is executing on
  • Contention for worker processes in a Storm cluster
  • Memory contention within a worker process (JVM)
  • Memory contention on a worker node
  • Worker node CPU contention
  • Worker node I/O contention

  • The commit count topology revisited
  • Diving into the details of an executor
  • Routing and tasks
  • Knowing when Storm’s internal queues overflow
  • Addressing internal Storm buffers overflowing
  • Tweaking buffer sizes for performance gain

  • What is Trident?
  • Kafka and its role with Trident
  • Problem definition: Internet radio
  • Implementing the internet radio design as a Trident topology
  • Accessing the persisted counts through DRPC
  • Mapping Trident operations to Storm primitives
  • Scaling a Trident topology