Apache Spark

Home » Technology » Big Data & Data Science » Apache Spark

Course Skill Level:

Foundational

Course Duration:

2 day/s

Course Delivery Format:

Live, instructor-led.
Course Category:

Big Data & Data Science
Course Code:

APASPAL21E09

Who should attend & recommended skills:

Those with basic IT and traditional database skills

Who should attend & recommended skills

This course is geared for attendees who want a practical guide for solving complex data processing challenges by applying the best optimizations techniques in Apache Spark.
Skill-level: Foundation-level Apache Spark skills for Intermediate skilled team members. This is not a basic class.
IT skills: Basic to Intermediate (1-5 years’ experience)
Traditional databases: Basic (1-2 years’ experience) helpful
Large-scale data analysis and NoSQL tools: No exposure required

About this course

Apache Spark is a ï¬‚exible framework that allows processing of batch and real-time data. Its unified engine has made it quite popular for big data use cases. This course will help you to get started with Apache Spark 2.0 and write big data applications for a variety of use cases. It will also introduce you to Apache Spark one of the most popular Big Data processing frameworks. Although this course is intended to help you get started with Apache Spark, it also focuses on explaining the core concepts. This practical guide provides a quick start to the Spark 2.0 architecture and its components. It teaches you how to set up Spark on your local machine. As we move ahead, you will be introduced to resilient distributed datasets (RDDs) and DataFrame APIs, and their corresponding transformations and actions. Then, we move on to the life cycle of a Spark application and learn about the techniques used to debug slow-running applications. You will also go through Sparks built-in modules for SQL, streaming, machine learning, and graph analysis. Finally, the course will lay out the best practices and optimization techniques that are key for writing efficient Spark applications. By the end of this course, you will have a sound fundamental understanding of the Apache Spark framework and you will be able to write and optimize Spark applications.

Skills acquired & topics covered

Working in a hands-on learning environment, led by our Apache Spark expert instructor, participants will learn about and explore:
The core concepts and the latest developments in Apache Spark
Mastering writing efficient big data applications with Sparks built-in modules for SQL, Streaming, Machine Learning and Graph analysis
Introduction to a variety of optimizations based on the actual experience
Core concepts such as RDDs, DataFrames, transformations, and more
Setting up a Spark development environment
Choosing the right APIs for your applications
Understanding Sparks architecture and the execution ﬂow of a Spark application
Exploring built-in modules for SQL, streaming, ML, and graph analysis
Optimizing your Spark job for better performance

Course breakdown / modules

What is Spark?
Spark architecture overview
Spark language APIs
Spark components
Making the most of Hadoop and Spark

AWS elastic compute cloud (EC2)
Configuring Spark

What is an RDD?
Programming using RDDs
Transformations and actions
Types of RDDs
Caching and checkpointing
Understanding partitions
Drawbacks of using RDDs

Spark DataFrame and Dataset
DataFrames
Datasets

A sample application
Application execution modes
Application monitoring

Spark SQL

Spark Streaming
Machine learning
Graph processing

Cluster-level optimizations
Application optimizations

Free Training Courses

Leadership & Professional Development Courses

Microsoft Office Courses

Technology Courses

Who should attend & recommended skills

About this course

Skills acquired & topics covered

Course breakdown / modules

Browse our programs to take the next step toward advancing yourself, your team, and organization.

Free Training Courses

Leadership & Professional Development Courses

Microsoft Office Courses

Technology Courses

Let us help you find the training program you are looking for.

Apache Spark

Who should attend & recommended skills

About this course

Skills acquired & topics covered

Course breakdown / modules

Introduction to Apache Spark

Apache Spark Installation

Spark RDD

Spark DataFrame and Dataset

Spark Architecture and Application Execution Flow

Spark SQL

Spark Streaming, Machine Learning, and Graph Analysis

Spark Optimizations

Browse our programs to take the next step toward advancing yourself, your team, and organization.

View Course Detail