Let us help you find the training program you are looking for.

If you can't find what you are looking for, contact us, we'll help you find it. We have over 800 training programs to choose from.

banner-img

Course Skill Level:

Intermediate

Course Duration:

3 day/s

  • Course Delivery Format:

    Live, instructor-led.

  • Course Category:

    Big Data & Data Science

  • Course Code:

    HADINPL21E09

Who should attend & recommended skills:

Those with basic IT, programming language, & Hadoop skills

Who should attend & recommended skills

  • This course is geared for attendees who want to conquer big data, using Hadoop. This revised new edition covers changes and new features in the Hadoop core architecture, including MapReduce 2. It also covers YARN and integrating Kafka, Impala, and Spark SQL with Hadoop. You’ll also get new and updated techniques for Flume, Sqoop, and Mahout, all of which have seen major new versions recently.
  • Skill-level: Foundation-level Hadoop skills for Intermediate skilled team members. This is not a basic class.
  • Programming language like Java: Basic to Intermediate (1-5 years; experience) required
  • Hadoop: Basic (1-2 years’ experience) required
  • IT skills: Basic to Intermediate (1-5 years’ experience)
  • Attendees without a programming background like Python may view labs as follow along exercises or team with others to complete them

About this course

It’s always a good time to upgrade your Hadoop skills! Hadoop in Practice, This Edition provides a collection of 104 tested, instantly useful techniques for analyzing real-time streams, moving data securely, machine learning, managing large-scale clusters, and taming big data using Hadoop. This completely revised edition covers changes and new features in Hadoop core, including MapReduce 2 and YARN. You’ll pick up hands-on best practices for integrating Spark, Kafka, and Impala with Hadoop, and get new and updated techniques for the latest versions of Flume, Sqoop, and Mahout. In short, this is the most practical, up-to-date coverage of Hadoop available.

Skills acquired & topics covered

  • Working in a hands-on learning environment, led by our Hadoop expert instructor, students will learn about and explore:
  • Over 100 tested, instantly useful techniques that will help you conquer big data, using Hadoop
  • Changes and new features in the Hadoop core architecture, including MapReduce 2.
  • New lessons cover YARN and integrating Kafka, Impala, and Spark SQL with Hadoop.
  • New and updated techniques for Flume, Sqoop, and Mahout, all of which have seen major new versions recently
  • The most practical, up-to-date coverage of Hadoop available anywhere
  • How to write YARN applications
  • Integrate real-time technologies like Storm, Impala, and Spark
  • Predictive analytics using Mahout and RR

Course breakdown / modules

  • What is Hadoop?
  • Getting your hands dirty with MapReduce

  • YARN overview
  • YARN and MapReduce
  • YARN applications

  • Understanding inputs and outputs in MapReduce
  • Processing common serialization formats
  • Big data serialization formats
  • Columnar storage
  • Custom file formats

  • Data organization
  • Efficient storage with compression

  • Key elements of data movement
  • Moving data into Hadoop
  • Moving data out of Hadoop

  • Joining
  • Sorting
  • Sampling

  • Modeling data and solving problems with graphs
  • Bloom filters
  • HyperLogLog

  • Measure, measure, measure
  • Tuning MapReduce
  • Debugging
  • Testing MapReduce jobs

  • Hive
  • Impala
  • Spark SQL

  • Fundamentals of building a YARN application
  • Building a YARN application to collect cluster statistics
  • Additional YARN application capabilities
  • YARN programming abstraction