Integrating Hadoop

Home » Technology » Big Data & Data Science » Integrating Hadoop

Course Skill Level:

Foundational

Course Duration:

3 day/s

Course Delivery Format:

Live, instructor-led.
Course Category:

Big Data & Data Science
Course Code:

INTHADL21E09

Who should attend & recommended skills:

Developers with basic Python skills

Who should attend & recommended skills

This course is designed for developers who want a simple approach to harnessing the data.
Skill-level: Foundation-level Hadoop skills for Intermediate skilled team members. This is not a basic class.
Python: Basic (1-2 years’ experience)

About this course

In today’s time, data with value is branched off into numerous databases across multiple companies. The challenge is bringing the data together. Integrating Hadoop shows how Hadoop is used to collect and load the data on physical devices and the cloud. The book begins with an introduction of Hadoop and the types of data fit for it. Next, it focuses on assembling the integration team and gives an overview of workloads in the organization. You will also identify data sources for Hadoop, such as No SQL Databases and Legacy/Relational Databases, distinguish between ETL and ELT, and learn how to load and unload data into Hadoop. You will also practice managing big data using methods such as Upserts and Use HBase, and discover the advantages of real-time computing and the basic structure of streaming data architecture. Finally, you will interact with the master data of an organization and learn the top 10 mistakes people commit while integrating Hadoop data and how to avoid them.

Skills acquired & topics covered

Working in a hands-on learning environment, led by our Hadoop expert instructor, students will learn about and explore:
Organizing a successful Hadoop rollout
Loading, unloading, and managing data in Hadoop
Integrating Hadoop with the existing information infrastructure
The different roles and responsibilities of the integration team
Moving data from one place to another with ETL and ELT
Loading the data into Hadoop using the original method, called Batch
How and where to use real-time computing framework Spark
Project Apache Kafka and its role in streaming data processor
Avoiding common mistakes of integrating Hadoop data

Course breakdown / modules

Introducing Hadoop
Hadoop Distributions

Assembling the Integration Team
Overview of Workloads for Hadoop in the Organization
Identifying Data Sources for Hadoop
Data Profiling
Analyzing and Profiling Source Systems and Data

Continued Need for More Speed
Preference with Hadoop
Is ETL Dead?

Advantages of Data Integration Tools
Methods of Data Loading
Path to Production
How-To with Talend Big Data

Big Data ELT
Importance of Data Quality in Hadoop
Stewardship of Big Data

Hadoop Extracts
Hadoop and SOA

Advantages of Real-Time Computing
How and Where to Use Spark

Streaming Data Technology Distinctions

Hadoop and Master Data Management
Integrating with Master Data
Data Virtualization
MDM and Hadoop Disconnects

1. Integrating Data Without a Business Purpose
2. Integrating Data into Hadoop for an Enterprise Data Repository
3. Overemphasis on Data Integration Performance to the Detriment of Query Performance for Data Usage
4. Not Refining Data to the Point of Usefulness
5. Improper Node Specification
6. Over-Reliance on Open Source Hadoop
7. ETL instead of ELT
8. Using MapReduce to Load Hadoop
9. Using Spark through Hive to Load Hadoop
10. Ignoring the Quality of the Data Being Loaded

Case Studies in Big Data Integration
Trends in Hadoop and Summary of Ideas

Free Training Courses

Leadership & Professional Development Courses

Microsoft Office Courses

Technology Courses

Who should attend & recommended skills

About this course

Skills acquired & topics covered

Course breakdown / modules

Browse our programs to take the next step toward advancing yourself, your team, and organization.

Free Training Courses

Leadership & Professional Development Courses

Microsoft Office Courses

Technology Courses

Let us help you find the training program you are looking for.

Integrating Hadoop

Who should attend & recommended skills

About this course

Skills acquired & topics covered

Course breakdown / modules

Hadoop in Support of an Information Strategy

Preparing for Integration

ETL versus ELT

Loading Data into Hadoop

Managing Big Data

Unloading/Distributing Data from Hadoop

Apache Spark Cluster Computing with Hadoop

Streaming Data

Master Data Management and Big Data

Top 10 Mistakes Integrating Hadoop Data

Case Studies and Trends

Browse our programs to take the next step toward advancing yourself, your team, and organization.

View Course Detail