Let us help you find the training program you are looking for.

If you can't find what you are looking for, contact us, we'll help you find it. We have over 800 training programs to choose from.

banner-img

Course Skill Level:

Foundational to Intermediate

Course Duration:

2 day/s

  • Course Delivery Format:

    Live, instructor-led.

  • Course Category:

    Big Data & Data Science

  • Course Code:

    APAHA3L21E09

Who should attend & recommended skills:

Those with basic IT & programming skills

Who should attend & recommended skills

  • This course is geared for those who want a fast-paced guide that will help you learn about Apache Hadoop 3 and its ecosystem.
  • Skill-level: Foundation-level Apache Hadoop skills for Intermediate skilled team members. This is not a basic class.
  • IT skills: Basic to Intermediate (1-5 years’ experience)
  • Programming: Basic (1-2 years’ experience) – Attendees without a programming background like Python may view labs as follow-along exercises or team with others to complete them

About this course

Apache Hadoop is a widely used distributed data platform. It enables large datasets to be efficiently processed instead of using one large computer to store and process the data. This course will get you started with the Hadoop ecosystem, and introduce you to the main technical topics, including MapReduce, YARN, and HDFS. The course begins with an overview of big data and Apache Hadoop. Then, you will set up a pseudo Hadoop development environment and a multi-node enterprise Hadoop cluster. You will see how the parallel programming paradigm, such as MapReduce, can solve many complex data processing problems. The course also covers the important aspects of the big data software development lifecycle, including quality assurance and control, performance, administration, and monitoring. You will then learn about the Hadoop ecosystem, and tools such as Kafka, Sqoop, Flume, Pig, Hive, and HBase. Finally, you will look at advanced topics, including real time streaming using Apache Storm, and data analytics using Apache Spark. By the end of the course, you will be well versed with different configurations of the Hadoop 3 cluster.

Skills acquired & topics covered

  • Working in a hands-on learning environment led by our Apache Hadoop 3 expert instructor, participants will learn about and explore:
  • Setting up, configuring and getting started with Hadoop to get useful insights from large data sets
  • Working with the different components of Hadoop such as MapReduce, HDFS and YARN
  • Learning about the new features introduced in Hadoop 3
  • Storing and analyzing data at scale using HDFS, MapReduce and YARN
  • Installing and configuring Hadoop 3 in different modes
  • Using Yarn effectively to run different applications on Hadoop based platform
  • Understanding and monitoring how Hadoop cluster is managed
  • Consuming streaming data using Storm, and then analyzing it using Spark
  • Exploring Apache Hadoop ecosystem components, such as Flume, Sqoop, HBase, Hive, and Kafka

Course breakdown / modules

  • Hadoop 3.0 – Background and Introduction
  • How it all started
  • What Hadoop is and why it is important
  • How Apache Hadoop works
  • Hadoop 3.0 releases and new features
  • Choosing the right Hadoop distribution

  • Planning and Setting Up Hadoop Clusters
  • Technical requirements
  • Prerequisites for Hadoop setup
  • Running Hadoop in standalone mode
  • Setting up a pseudo Hadoop cluster
  • Planning and sizing clusters
  • Setting up Hadoop in cluster mode
  • Diagnosing the Hadoop cluster

  • Deep Dive into the Hadoop Distributed File System
  • Technical requirements
  • How HDFS works
  • Key features of HDFS
  • Data flow patterns of HDFS
  • HDFS configuration files
  • Hadoop filesystem CLIs
  • Working with data structures in HDFS

  • Developing MapReduce Applications
  • Technical requirements
  • How MapReduce works
  • Configuring a MapReduce environment
  • Understanding Hadoop APIs and packages
  • Setting up a MapReduce project
  • Deep diving into MapReduce APIs
  • Compiling and running MapReduce jobs
  • Streaming in MapReduce programming

  • Building Rich YARN Applications
  • Technical requirements
  • Understanding YARN architecture
  • Key features of YARN
  • Configuring the YARN environment in a cluster
  • Working with YARN distributed CLI
  • Deep dive with YARN application framework
  • Building and monitoring a YARN application on a cluster

  • Monitoring and Administration of a Hadoop Cluster
  • Roles and responsibilities of Hadoop administrators
  • Planning your distributed cluster
  • Resource management in Hadoop
  • High availability of Hadoop
  • Securing Hadoop clusters
  • Performing routine tasks

  • Demystifying Hadoop Ecosystem Components
  • Technical requirements
  • Understanding Hadoop’s Ecosystem
  • Working with Apache Kafka
  • Writing Apache Pig scripts
  • Transferring data with Sqoop
  • Writing Flume jobs
  • Understanding Hive
  • Using HBase for NoSQL storage

  • Advanced Topics in Apache Hadoop
  • Technical requirements
  • Hadoop use cases in industries
  • Advanced Hadoop data storage file formats
  • Real-time streaming with Apache Storm
  • Data analytics with Apache Spark