Introduction to Apache Spark

Home » Technology » Big Data & Data Science » Introduction to Apache Spark

Course Skill Level:

Foundational to Intermediate

Course Duration:

3 day/s

Course Delivery Format:

Live, instructor-led.
Course Category:

Big Data & Data Science
Course Code:

IAPASPK21M09

Who should attend & recommended skills:

Developers, Data analysts, and business analysts

Who should attend & recommended skills

Developers, Data analysts, and business analysts.
Even if you haven’t done any Python programming, Python is such an easy language to learn quickly.
We will provide Python resources.
Jupyter notebooks: Basic (1-2 years’ experience) preferred not requiredA reasonably modern laptop with unrestricted connection to the Internet.
Laptops with overly restrictive VPNs or firewalls may not work properly; requiredChrome browser, required.

About this course

We are living in an era of ‘big data’. Being able to analyze and process big data is vital for enterprises. Spark is a popular platform for analyzing big data. This course introduces Apache Spark to students. This class is taught with Python language using Jupyter environment.

Skills acquired & topics covered

Spark ecosystem
Spark Shell
Spark Data structures (RDD / Dataframe / Dataset)
Spark SQL
Modern data formats and Spark
Spark API
Spark, Hadoop, and Hive
Spark ML overview
GraphX
Spark Streaming

Course breakdown / modules

Big Data , Hadoop, Spark
Spark concepts and architecture
Spark components overview
Labs : Installing and running Spark

Spark shell
Spark web UIs
Analyzing dataset – part 1
Labs: Spark shell exploration

Partitions
Distributed execution
Operations : transformations and actions
Labs : Unstructured data analytics using RDDs

Caching overview
Various caching mechanisms available in Spark
In memory file systems
Caching use cases and best practices
Labs: Benchmark of caching performance

Dataframes Intro
Loading structured data (json, CSV) using Dataframes
Using schema
Specifying schema for Dataframes
Labs : Dataframes, Datasets, Schema

Spark SQL concepts and overview
Defining tables and importing datasets
Querying data using SQL
Handling various storage formats : JSON / Parquet / ORC
Labs : querying structured data using SQL; evaluating data formats

Hadoop Primer : HDFS / YARN
Hadoop + Spark architecture
Running Spark on Hadoop YARN
Processing HDFS files using Spark
Spark Hive
Spark API loying an Spark application

Machine Learning primer
Machine Learning in Spark: MLib / ML
Spark ML overview (newer Spark2 version)
Algorithms overview: Clustering, Classifications, Recommendations
Labs: Writing ML applications in Spark

GraphX library overview
GraphX APIs
Create a Graph and navigating it
Shortest distance
Pregel API
Labs: Processing graph data using Spark

Streaming concepts
Evaluating Streaming platforms
Spark streaming library overview
Streaming operations
Sliding window operations
Structured Streaming
Continuous streaming
Spark Kafka streaming
Labs: Writing spark streaming applications

These are group workshops
Attendees will work on solving real world data analysis problems using Spark

Free Training Courses

Leadership & Professional Development Courses

Microsoft Office Courses

Technology Courses

Who should attend & recommended skills

About this course

Skills acquired & topics covered

Course breakdown / modules

Browse our programs to take the next step toward advancing yourself, your team, and organization.

Free Training Courses

Leadership & Professional Development Courses

Microsoft Office Courses

Technology Courses

Let us help you find the training program you are looking for.

Introduction to Apache Spark

Who should attend & recommended skills

About this course

Skills acquired & topics covered

Course breakdown / modules

Spark Introduction

First Look at Spark

Spark Data structures

Caching

Dataframes / Datasets

Spark SQL

Spark and Hadoop

Spark ML Overview

GraphX

Spark Streaming

Workshops (Time permitting)

Browse our programs to take the next step toward advancing yourself, your team, and organization.

View Course Detail