Hadoop in Practice

Home » Technology » Big Data & Data Science » Hadoop in Practice

Course Skill Level:

Intermediate

Course Duration:

3 day/s

Course Delivery Format:

Live, instructor-led.
Course Category:

Big Data & Data Science
Course Code:

HADINPL21E09

Who should attend & recommended skills:

Those with basic IT, programming language, & Hadoop skills

Who should attend & recommended skills

This course is geared for attendees who want to conquer big data, using Hadoop. This revised new edition covers changes and new features in the Hadoop core architecture, including MapReduce 2. It also covers YARN and integrating Kafka, Impala, and Spark SQL with Hadoop. You’ll also get new and updated techniques for Flume, Sqoop, and Mahout, all of which have seen major new versions recently.
Skill-level: Foundation-level Hadoop skills for Intermediate skilled team members. This is not a basic class.
Programming language like Java: Basic to Intermediate (1-5 years; experience) required
Hadoop: Basic (1-2 years’ experience) required
IT skills: Basic to Intermediate (1-5 years’ experience)
Attendees without a programming background like Python may view labs as follow along exercises or team with others to complete them

About this course

It’s always a good time to upgrade your Hadoop skills! Hadoop in Practice, This Edition provides a collection of 104 tested, instantly useful techniques for analyzing real-time streams, moving data securely, machine learning, managing large-scale clusters, and taming big data using Hadoop. This completely revised edition covers changes and new features in Hadoop core, including MapReduce 2 and YARN. You’ll pick up hands-on best practices for integrating Spark, Kafka, and Impala with Hadoop, and get new and updated techniques for the latest versions of Flume, Sqoop, and Mahout. In short, this is the most practical, up-to-date coverage of Hadoop available.

Skills acquired & topics covered

Working in a hands-on learning environment, led by our Hadoop expert instructor, students will learn about and explore:
Over 100 tested, instantly useful techniques that will help you conquer big data, using Hadoop
Changes and new features in the Hadoop core architecture, including MapReduce 2.
New lessons cover YARN and integrating Kafka, Impala, and Spark SQL with Hadoop.
New and updated techniques for Flume, Sqoop, and Mahout, all of which have seen major new versions recently
The most practical, up-to-date coverage of Hadoop available anywhere
How to write YARN applications
Integrate real-time technologies like Storm, Impala, and Spark
Predictive analytics using Mahout and RR

Course breakdown / modules

What is Hadoop?
Getting your hands dirty with MapReduce

YARN overview
YARN and MapReduce
YARN applications

Understanding inputs and outputs in MapReduce
Processing common serialization formats
Big data serialization formats
Columnar storage
Custom file formats

Data organization
Efficient storage with compression

Key elements of data movement
Moving data into Hadoop
Moving data out of Hadoop

Joining
Sorting
Sampling

Modeling data and solving problems with graphs
Bloom filters
HyperLogLog

Measure, measure, measure
Tuning MapReduce
Debugging
Testing MapReduce jobs

Hive
Impala
Spark SQL

Fundamentals of building a YARN application
Building a YARN application to collect cluster statistics
Additional YARN application capabilities
YARN programming abstraction

Free Training Courses

Leadership & Professional Development Courses

Microsoft Office Courses

Technology Courses

Who should attend & recommended skills

About this course

Skills acquired & topics covered

Course breakdown / modules

Browse our programs to take the next step toward advancing yourself, your team, and organization.

Free Training Courses

Leadership & Professional Development Courses

Microsoft Office Courses

Technology Courses

Let us help you find the training program you are looking for.

Hadoop in Practice

Who should attend & recommended skills

About this course

Skills acquired & topics covered

Course breakdown / modules

Hadoop in a heartbeat free

Introduction to YARN

Data serialization-working with text and beyond

Organizing and optimizing data in HDFS

Moving data into and out of Hadoop

Applying MapReduce patterns to big data

Utilizing data structures and algorithms at scale

Tuning, debugging, and testing

SQL on Hadoop

Writing a YARN application

Browse our programs to take the next step toward advancing yourself, your team, and organization.

View Course Detail