Let us help you find the training program you are looking for.

If you can't find what you are looking for, contact us, we'll help you find it. We have over 800 training programs to choose from.

banner-img

Course Skill Level:

Foundational

Course Duration:

2 day/s

  • Course Delivery Format:

    Live, instructor-led.

  • Course Category:

    Big Data & Data Science

  • Course Code:

    APASPAL21E09

Who should attend & recommended skills:

Those with basic IT and traditional database skills

Who should attend & recommended skills

  • This course is geared for attendees who want a practical guide for solving complex data processing challenges by applying the best optimizations techniques in Apache Spark.
  • Skill-level: Foundation-level Apache Spark skills for Intermediate skilled team members. This is not a basic class.
  • IT skills: Basic to Intermediate (1-5 years’ experience)
  • Traditional databases: Basic (1-2 years’ experience) helpful
  • Large-scale data analysis and NoSQL tools: No exposure required

About this course

Apache Spark is a flexible framework that allows processing of batch and real-time data. Its unified engine has made it quite popular for big data use cases. This course will help you to get started with Apache Spark 2.0 and write big data applications for a variety of use cases. It will also introduce you to Apache Spark one of the most popular Big Data processing frameworks. Although this course is intended to help you get started with Apache Spark, it also focuses on explaining the core concepts. This practical guide provides a quick start to the Spark 2.0 architecture and its components. It teaches you how to set up Spark on your local machine. As we move ahead, you will be introduced to resilient distributed datasets (RDDs) and DataFrame APIs, and their corresponding transformations and actions. Then, we move on to the life cycle of a Spark application and learn about the techniques used to debug slow-running applications. You will also go through Sparks built-in modules for SQL, streaming, machine learning, and graph analysis. Finally, the course will lay out the best practices and optimization techniques that are key for writing efficient Spark applications. By the end of this course, you will have a sound fundamental understanding of the Apache Spark framework and you will be able to write and optimize Spark applications.

Skills acquired & topics covered

  • Working in a hands-on learning environment, led by our Apache Spark expert instructor, participants will learn about and explore:
  • The core concepts and the latest developments in Apache Spark
  • Mastering writing efficient big data applications with Sparks built-in modules for SQL, Streaming, Machine Learning and Graph analysis
  • Introduction to a variety of optimizations based on the actual experience
  • Core concepts such as RDDs, DataFrames, transformations, and more
  • Setting up a Spark development environment
  • Choosing the right APIs for your applications
  • Understanding Sparks architecture and the execution flow of a Spark application
  • Exploring built-in modules for SQL, streaming, ML, and graph analysis
  • Optimizing your Spark job for better performance

Course breakdown / modules

  • What is Spark?
  • Spark architecture overview
  • Spark language APIs
  • Spark components
  • Making the most of Hadoop and Spark

  • AWS elastic compute cloud (EC2)
  • Configuring Spark

  • What is an RDD?
  • Programming using RDDs
  • Transformations and actions
  • Types of RDDs
  • Caching and checkpointing
  • Understanding partitions
  • Drawbacks of using RDDs

  • Spark DataFrame and Dataset
  • DataFrames
  • Datasets

  • A sample application
  • Application execution modes
  • Application monitoring

  • Spark SQL

  • Spark Streaming
  • Machine learning
  • Graph processing

  • Cluster-level optimizations
  • Application optimizations