Let us help you find the training program you are looking for.

If you can't find what you are looking for, contact us, we'll help you find it. We have over 800 training programs to choose from.


Course Skill Level:


Course Duration:

2 day/s

  • Course Delivery Format:

    Live, instructor-led.

  • Course Category:

    Big Data & Data Science

  • Course Code:


Who should attend & recommended skills:

Developers, analysts, and others with basic developing, Python, and spreadsheet experience

Who should attend & recommended skills

  • This course is geared for Python experienced developers, analysts or others who wish to Master Scala’s advanced techniques to solve real-world problems in data analysis and gain valuable insights from your data.
  • Skill-level: Foundation-level Data Analysis with Scala skills for Intermediate skilled team members. This is not a basic class.
  • Developers: Basic to Intermediate (1-5 years’ experience)
  • Python: Basic (1-2 years’ experience)
  • Spreadsheet software: Basic to Intermediate (1-5 years’ experience)

About this course

Efficient business decisions with an accurate sense of business data helps in delivering better performance across products and services. This course helps you to leverage the popular Scala libraries and tools for performing core data analysis tasks with ease. The course begins with a quick overview of the building blocks of a standard data analysis process. You will learn to perform basic tasks like Extraction, Staging, Validation, Cleaning, and Shaping of datasets. You will later deep dive into the data exploration and visualization areas of the data analysis life cycle. You will make use of popular Scala libraries like Saddle, Breeze, Vegas, and Prediction for processing your datasets. You will learn statistical methods for deriving meaningful insights from data. You will also learn to create applications for Apache Spark 2.x on complex data analysis, in real-time. You will discover traditional machine learning techniques for doing data analysis. Furthermore, you will also be introduced to neural networks and deep learning from a data analysis standpoint. By the end of this course, you will be capable of handling large sets of structured and unstructured data, perform exploratory analysis, and building efficient Scala applications for discovering and delivering insights.

Skills acquired & topics covered

  • Working in a hands-on learning environment, led by our Data Analysis with Scala expert instructor, students will learn about and explore:
  • A beginner’s guide for performing data analysis loaded with numerous rich, practical examples
  • Access to popular Scala libraries such as Breeze, Saddle for efficient data manipulation and exploratory analysis
  • Developing applications in Scala for real-time analysis and machine learning in Apache Spark.
  • Techniques to determine the validity and confidence level of data
  • Applying quartiles and n-tiles to datasets to see how data is distributed into many buckets
  • Creating data pipelines that combine multiple data lifecycle steps
  • Using built-in features to gain a deeper understanding of the data
  • Applying Lasso regression analysis method to your data
  • Comparing Apache Spark API with traditional Apache Spark data analysis

Course breakdown / modules

  • Getting started with Scala
  • Overview of object-oriented and functional programming
  • Scala case classes and the collection API
  • Overview of Scala libraries for data analysis

  • Data journey
  • Sourcing data
  • Understanding data
  • Using ML to learn from data
  • Creating a data pipeline

  • Data extraction
  • Data staging
  • Cleaning and normalizing
  • Enriching
  • Organizing and storing

  • Sampling data
  • Performing ad hoc analysis
  • Finding a relationship between data elements
  • Visualizing data

  • Basics of statistics
  • Vector level statistics
  • Random data generation
  • Hypothesis testing

  • Spark setup and overview
  • Spark Datasets and DataFrames
  • Sourcing data using Spark
  • Using Spark to explore data

  • ML overview
  • Decision trees
  • Random forest
  • Ridge and lasso regression
  • k-means cluster analysis
  • Natural language processing for data analysis
  • Algorithm selections

  • Overview of streaming
  • Spark Streaming overview
  • Streaming a k-means clustering algorithm using Spark
  • Streaming linear regression using Spark

  • Working with Data at Scale
  • Working with data at scale
  • Cost considerations
  • Reliability considerations