Apache Hadoop 3

Home » Technology » Big Data & Data Science » Apache Hadoop 3

Course Skill Level:

Foundational to Intermediate

Course Duration:

2 day/s

Course Delivery Format:

Live, instructor-led.
Course Category:

Big Data & Data Science
Course Code:

APAHA3L21E09

Who should attend & recommended skills:

Those with basic IT & programming skills

Who should attend & recommended skills

This course is geared for those who want a fast-paced guide that will help you learn about Apache Hadoop 3 and its ecosystem.
Skill-level: Foundation-level Apache Hadoop skills for Intermediate skilled team members. This is not a basic class.
IT skills: Basic to Intermediate (1-5 years’ experience)
Programming: Basic (1-2 years’ experience) – Attendees without a programming background like Python may view labs as follow-along exercises or team with others to complete them

About this course

Apache Hadoop is a widely used distributed data platform. It enables large datasets to be efficiently processed instead of using one large computer to store and process the data. This course will get you started with the Hadoop ecosystem, and introduce you to the main technical topics, including MapReduce, YARN, and HDFS. The course begins with an overview of big data and Apache Hadoop. Then, you will set up a pseudo Hadoop development environment and a multi-node enterprise Hadoop cluster. You will see how the parallel programming paradigm, such as MapReduce, can solve many complex data processing problems. The course also covers the important aspects of the big data software development lifecycle, including quality assurance and control, performance, administration, and monitoring. You will then learn about the Hadoop ecosystem, and tools such as Kafka, Sqoop, Flume, Pig, Hive, and HBase. Finally, you will look at advanced topics, including real time streaming using Apache Storm, and data analytics using Apache Spark. By the end of the course, you will be well versed with different configurations of the Hadoop 3 cluster.

Skills acquired & topics covered

Working in a hands-on learning environment led by our Apache Hadoop 3 expert instructor, participants will learn about and explore:
Setting up, configuring and getting started with Hadoop to get useful insights from large data sets
Working with the different components of Hadoop such as MapReduce, HDFS and YARN
Learning about the new features introduced in Hadoop 3
Storing and analyzing data at scale using HDFS, MapReduce and YARN
Installing and configuring Hadoop 3 in different modes
Using Yarn effectively to run different applications on Hadoop based platform
Understanding and monitoring how Hadoop cluster is managed
Consuming streaming data using Storm, and then analyzing it using Spark
Exploring Apache Hadoop ecosystem components, such as Flume, Sqoop, HBase, Hive, and Kafka

Course breakdown / modules

Hadoop 3.0 – Background and Introduction
How it all started
What Hadoop is and why it is important
How Apache Hadoop works
Hadoop 3.0 releases and new features
Choosing the right Hadoop distribution

Planning and Setting Up Hadoop Clusters
Technical requirements
Prerequisites for Hadoop setup
Running Hadoop in standalone mode
Setting up a pseudo Hadoop cluster
Planning and sizing clusters
Setting up Hadoop in cluster mode
Diagnosing the Hadoop cluster

Deep Dive into the Hadoop Distributed File System
Technical requirements
How HDFS works
Key features of HDFS
Data flow patterns of HDFS
HDFS configuration files
Hadoop filesystem CLIs
Working with data structures in HDFS

Developing MapReduce Applications
Technical requirements
How MapReduce works
Configuring a MapReduce environment
Understanding Hadoop APIs and packages
Setting up a MapReduce project
Deep diving into MapReduce APIs
Compiling and running MapReduce jobs
Streaming in MapReduce programming

Building Rich YARN Applications
Technical requirements
Understanding YARN architecture
Key features of YARN
Configuring the YARN environment in a cluster
Working with YARN distributed CLI
Deep dive with YARN application framework
Building and monitoring a YARN application on a cluster

Monitoring and Administration of a Hadoop Cluster
Roles and responsibilities of Hadoop administrators
Planning your distributed cluster
Resource management in Hadoop
High availability of Hadoop
Securing Hadoop clusters
Performing routine tasks

Demystifying Hadoop Ecosystem Components
Technical requirements
Understanding Hadoop’s Ecosystem
Working with Apache Kafka
Writing Apache Pig scripts
Transferring data with Sqoop
Writing Flume jobs
Understanding Hive
Using HBase for NoSQL storage

Advanced Topics in Apache Hadoop
Technical requirements
Hadoop use cases in industries
Advanced Hadoop data storage file formats
Real-time streaming with Apache Storm
Data analytics with Apache Spark

Free Training Courses

Leadership & Professional Development Courses

Microsoft Office Courses

Technology Courses

Who should attend & recommended skills

About this course

Skills acquired & topics covered

Course breakdown / modules

Browse our programs to take the next step toward advancing yourself, your team, and organization.

Free Training Courses

Leadership & Professional Development Courses

Microsoft Office Courses

Technology Courses

Let us help you find the training program you are looking for.

Apache Hadoop 3

Who should attend & recommended skills

About this course

Skills acquired & topics covered

Course breakdown / modules

Hadoop 3.0 - Background and Introduction

Planning and Setting Up Hadoop Clusters

Deep Dive into the Hadoop Distributed File System

Developing MapReduce Applications

Building Rich YARN Applications

Monitoring and Administration of a Hadoop Cluster

Demystifying Hadoop Ecosystem Components

Advanced Topics in Apache Hadoop

Browse our programs to take the next step toward advancing yourself, your team, and organization.

View Course Detail