Hadoop, Hive, and Spark

Home » Technology » Big Data & Data Science » Hadoop, Hive, and Spark

Course Skill Level:

Foundational

Course Duration:

5 day/s

Course Delivery Format:

Live, instructor-led.
Course Category:

Big Data & Data Science
Course Code:

HAHISPL21E09

Who should attend & recommended skills:

Business analysts, Software developers, Managers with basic SQL, software design, & Python experience

Who should attend & recommended skills

Business analysts, Software developers, Managers
SQl: Basic (1-2 years’ experience)
Software Design: Basic (1-2 years’ experience)
Python: Basic (1-2 years’ experience)

About this course

Hadoop is a mature Big Data environment, with Hive is de-facto standard for the SQL interface. Today, the computations in Hadoop are usually done with Spark. Spark offers an optimized compute engine that includes batch, and real-time streaming, and machine learning.
This course covers Hadoop 3, Hive 3, and Spark 3

Skills acquired & topics covered

Why Hadoop?
The Hadoop platform
Hive Basics
New in Hive 3
HBase
Sqoop
The big picture
Spark Introduction
First Look at Spark
Spark Data structures
Caching
DataFrames and Datasets
Spark SQL
Spark and Hadoop
Spark API
Spark ML Overview
GraphX
Spark Streaming

Course breakdown / modules

The motivation for Hadoop
Use cases and case studies about Hadoop

MapReduce, HDFS, YARN
New in Hadoop 3
Erasure Coding vs 3x replication

Defining Hive Tables
SQL Queries over Structured Data
Filtering / Search
Aggregations / Ordering
Partitions
Joins
Text Analytics (Semi Structured Data)

ACID tables
Hive Query Language (HQL)
How to run a good query?
How to trouble shoot queries?

Basics
HBase tables – design and use
Phoenix driver for HBase tables

Tool
Architecture
Use

How Hadoop fits into your architecture
Hive vs HBase with Phoenix vs Excel

Big Data , Hadoop, Spark
Spark concepts and architecture
Spark components overview
Labs : Installing and running Spark

Spark shell
Spark web UIs
Analyzing dataset – part 1
Labs: Spark shell exploration

Partitions
Distributed execution
Operations : transformations and actions
Labs : Unstructured data analytics using RDDs

Caching overview
Various caching mechanisms available in Spark
In memory file systems
Caching use cases and best practices
Labs: Benchmark of caching performance

DataFrames Intro
Loading structured data (JSON, CSV) using DataFrames
Using schema
Specifying schema for DataFrames
Labs : DataFrames, Datasets, Schema

Spark SQL concepts and overview
Defining tables and importing datasets
Querying data using SQL
Handling various storage formats : JSON / Parquet / ORC
Labs : querying structured data using SQL; evaluating data formats

Hadoop + Spark architecture
Running Spark on Hadoop YARN
Processing HDFS files using Spark
Spark Hive

Overview of Spark APIs in Scala / Python
Life cycle of an Spark application
Spark APIs
Deploying Spark applications on YARN
Labs: Developing and deploying an Spark application

Machine Learning primer
Machine Learning in Spark: MLib / ML
Spark ML overview (newer Spark2 version)
Algorithms overview: Clustering, Classifications, Recommendations
Labs: Writing ML applications in Spark

GraphX library overview
GraphX APIs
Create a Graph and navigating it
Shortest distance
Pregel API
Labs: Processing graph data using Spark

Streaming concepts
Evaluating Streaming platforms
Spark streaming library overview
Streaming operations
Sliding window operations
Structured Streaming
Continuous streaming
Spark Kafka streaming
Labs: Writing spark streaming applications

These are group workshops
Attendees will work on solving real world data analysis problems using Spark

Free Training Courses

Leadership & Professional Development Courses

Microsoft Office Courses

Technology Courses

Who should attend & recommended skills

About this course

Skills acquired & topics covered

Course breakdown / modules

Browse our programs to take the next step toward advancing yourself, your team, and organization.

Free Training Courses

Leadership & Professional Development Courses

Microsoft Office Courses

Technology Courses

Let us help you find the training program you are looking for.

Hadoop, Hive, and Spark

Who should attend & recommended skills

About this course

Skills acquired & topics covered

Course breakdown / modules

Why Hadoop?

The Hadoop platform

Hive Basics

New in Hive 3

HBase

Sqoop

The big picture

Spark Introduction

First Look at Spark

Spark Data structures

Caching

DataFrames and Datasets

Spark SQL

Spark and Hadoop

Spark API

Spark ML Overview

GraphX

Spark Streaming

Workshops (Time permitting)

Browse our programs to take the next step toward advancing yourself, your team, and organization.

View Course Detail