- Duration: 3 days
- Skill-level: Foundation-level Hadoop skills for Intermediate skilled team members. This is not a basic class.
- Targeted Audience: This course is geared for those who wants to conquer big data, using Hadoop. This revised new edition covers changes and new features in the Hadoop core architecture, including MapReduce 2
- Hands-on Learning: This course is approximately 50% hands-on lab to 50% lecture ratio, combining engaging lecture, demos, group activities and discussions with machine-based student labs and exercises. Student machines are required.
- Delivery Format: This course is available for onsite private classroom presentation.
- Customizable: This course may be tailored to target your specific training skills objectives, tools of choice and learning goals.
It’s always a good time to upgrade your Hadoop skills! Hadoop in Practice, This Edition provides a collection of 104 tested, instantly useful techniques for analyzing real-time streams, moving data securely, machine learning, managing large-scale clusters, and taming big data using Hadoop. This completely revised edition covers changes and new features in Hadoop core, including MapReduce 2 and YARN. You’ll pick up hands-on best practices for integrating Spark, Kafka, and Impala with Hadoop, and get new and updated techniques for the latest versions of Flume, Sqoop, and Mahout. In short, this is the most practical, up-to-date coverage of Hadoop available.
Readers need to know a programming language like Java and have basic familiarity with Hadoop.
Working in a hands-on learning environment, led by our Hadoop expert instructor, students will learn about and explore:
- provides over 100 tested, instantly useful techniques that will help you conquer big data, using Hadoop.
- This revised new edition covers changes and new features in the Hadoop core architecture, including MapReduce 2.
- new lessons cover YARN and integrating Kafka, Impala, and Spark SQL with Hadoop.
- You’ll also get new and updated techniques for Flume, Sqoop, and Mahout, all of which have seen major new versions recently.
- this is the most practical, up-to-date coverage of Hadoop available anywhere
Topics Covered: This is a high-level list of topics covered in this course. Please see the detailed Agenda below
- Thoroughly updated for Hadoop 2
- How to write YARN applications
- Integrate real-time technologies like Storm, Impala, and Spark
- Predictive analytics using Mahout and RR
Audience & Pre-Requisites
This course is geared for attendees who want to cover YARN and integrating Kafka, Impala, and Spark SQL with Hadoop. You’ll also get new and updated techniques for Flume, Sqoop, and Mahout, all of which have seen major new versions recently.
Pre-Requisites: Students should have
- Basic to Intermediate IT Skills. Attendees without a programming background like Python may view labs as follow along exercises or team with others to complete them.
- Good foundational mathematics or logic skills
- assumes you’ve already started exploring Hadoop and want concrete advice on how to use it in production.
Course Agenda / Topics
- Hadoop in a heartbeatfree
- What is Hadoop?
- Getting your hands dirty with MapReduce
- Introduction to YARN
- YARN overview
- YARN and MapReduce
- YARN applications
- Data serialization—working with text and beyond
- Understanding inputs and outputs in MapReduce
- Processing common serialization formats
- Big data serialization formats
- Columnar storage
- Custom file formats
- Organizing and optimizing data in HDFS
- Data organization
- Efficient storage with compression
- Moving data into and out of Hadoop
- Key elements of data movement
- Moving data into Hadoop
- Moving data out of Hadoop
- Applying MapReduce patterns to big data
- Utilizing data structures and algorithms at scale
- Modeling data and solving problems with graphs
- Bloom filters
- Tuning, debugging, and testing
- Measure, measure, measure
- Tuning MapReduce
- Testing MapReduce jobs
- SQL on Hadoop
- Writing a YARN application
- Fundamentals of building a YARN application
- Building a YARN application to collect cluster statistics
- Additional YARN application capabilities
- YARN programming abstraction