Let us help you find the training program you are looking for.

If you can't find what you are looking for, contact us, we'll help you find it. We have over 800 training programs to choose from.

Apache Hadoop 3

  • Course Code: Big Data - Apache Hadoop 3
  • Course Dates: Contact us to schedule.
  • Course Category: Big Data & Data Science Duration: 2 Days Audience: This course is geared for those who a fast paced guide that will help you learn about Apache Hadoop 3 and its ecosystem

Course Snapshot 

  • Duration: 2 days 
  • Skill-level: Foundation-level Apache Hadoop skills for Intermediate skilled team members. This is not a basic class. 
  • Targeted Audience: This course is geared for those who a fast paced guide that will help you learn about Apache Hadoop 3 and its ecosystem 
  • Hands-on Learning: This course is approximately 50% hands-on lab to 50% lecture ratio, combining engaging lecture, demos, group activities and discussions with machine-based student labs and exercises. Student machines are required. 
  • Delivery Format: This course is available for onsite private classroom presentation. 
  • Customizable: This course may be tailored to target your specific training skills objectives, tools of choice and learning goals. 

Apache Hadoop is a widely used distributed data platform. It enables large datasets to be efficiently processed instead of using one large computer to store and process the data. This course will get you started with the Hadoop ecosystem, and introduce you to the main technical topics, including MapReduce, YARN, and HDFS. The course begins with an overview of big data and Apache Hadoop. Then, you will set up a pseudo Hadoop development environment and a multi-node enterprise Hadoop cluster. You will see how the parallel programming paradigm, such as MapReduce, can solve many complex data processing problems. The course also covers the important aspects of the big data software development lifecycle, including quality assurance and control, performance, administration, and monitoring. You will then learn about the Hadoop ecosystem, and tools such as Kafka, Sqoop, Flume, Pig, Hive, and HBase. Finally, you will look at advanced topics, including real time streaming using Apache Storm, and data analytics using Apache Spark. By the end of the course, you will be well versed with different configurations of the Hadoop 3 cluster. 

Working in a hands-on learning environment, led by our Apache Hadoop 3 expert instructor, students will learn about and explore: 

  • Set up, configure and get started with Hadoop to get useful insights from large data sets 
  • Work with the different components of Hadoop such as MapReduce, HDFS and YARN 
  • Learn about the new features introduced in Hadoop 3 

Topics Covered: This is a high-level list of topics covered in this course. Please see the detailed Agenda below 

  • Store and analyze data at scale using HDFS, MapReduce and YARN 
  • Install and configure Hadoop 3 in different modes 
  • Use Yarn effectively to run different applications on Hadoop based platform 
  • Understand and monitor how Hadoop cluster is managed 
  • Consume streaming data using Storm, and then analyze it using Spark 
  • Explore Apache Hadoop ecosystem components, such as Flume, Sqoop, HBase, Hive, and Kafka 

Audience & Pre-Requisites 

This course is geared for attendees who want a fast-paced guide that will help you learn about Apache Hadoop 3 and its ecosystem 

Pre-Requisites:  Students should have  

  • Basic to Intermediate IT Skills. Attendees without a programming background like Python may view labs as follow along exercises or team with others to complete them. 
  • Good foundational mathematics or logic skills 

Course Agenda / Topics 

  1. Hadoop 3.0 – Background and Introduction 
  • Hadoop 3.0 – Background and Introduction 
  • How it all started  
  • What Hadoop is and why it is important 
  • How Apache Hadoop works  
  • Hadoop 3.0 releases and new features 
  • Choosing the right Hadoop distribution 
  1. Planning and Setting Up Hadoop Clusters 
  • Planning and Setting Up Hadoop Clusters 
  • Technical requirements 
  • Prerequisites for Hadoop setup 
  • Running Hadoop in standalone mode 
  • Setting up a pseudo Hadoop cluster 
  • Planning and sizing clusters 
  • Setting up Hadoop in cluster mode 
  • Diagnosing the Hadoop cluster 
  1. Deep Dive into the Hadoop Distributed File System 
  • Deep Dive into the Hadoop Distributed File System 
  • Technical requirements 
  • How HDFS works 
  • Key features of HDFS 
  • Data flow patterns of HDFS 
  • HDFS configuration files 
  • Hadoop filesystem CLIs 
  • Working with data structures in HDFS 
  1. Developing MapReduce Applications 
  • Developing MapReduce Applications 
  • Technical requirements 
  • How MapReduce works 
  • Configuring a MapReduce environment 
  • Understanding Hadoop APIs and packages 
  • Setting up a MapReduce project 
  • Deep diving into MapReduce APIs 
  • Compiling and running MapReduce jobs 
  • Streaming in MapReduce programming 
  1. Building Rich YARN Applications 
  • Building Rich YARN Applications 
  • Technical requirements 
  • Understanding YARN architecture 
  • Key features of YARN 
  • Configuring the YARN environment in a cluster 
  • Working with YARN distributed CLI 
  • Deep dive with YARN application framework 
  • Building and monitoring a YARN application on a cluster 
  1. Monitoring and Administration of a Hadoop Cluster 
  • Monitoring and Administration of a Hadoop Cluster 
  • Roles and responsibilities of Hadoop administrators 
  • Planning your distributed cluster 
  • Resource management in Hadoop 
  • High availability of Hadoop 
  • Securing Hadoop clusters 
  • Performing routine tasks 
  1. Demystifying Hadoop Ecosystem Components 
  • Demystifying Hadoop Ecosystem Components 
  • Technical requirements 
  • Understanding Hadoop’s Ecosystem 
  • Working with Apache Kafka 
  • Writing Apache Pig scripts 
  • Transferring data with Sqoop 
  • Writing Flume jobs 
  • Understanding Hive 
  • Using HBase for NoSQL storage 
  1. Advanced Topics in Apache Hadoop 
  • Advanced Topics in Apache Hadoop 
  • Technical requirements 
  • Hadoop use cases in industries 
  • Advanced Hadoop data storage file formats 
  • Real-time streaming with Apache Storm 
  • Data analytics with Apache Spark 
View All Courses

    Course Inquiry

    Fill in the details below and we will get back to you as quickly as we can.

    Interested in any of these related courses?