Let us help you find the training program you are looking for.

If you can't find what you are looking for, contact us, we'll help you find it. We have over 800 training programs to choose from.

Machine Learning with Spark 2.x

  • Course Code: Big Data - Machine Learning with Spark 2.x
  • Course Dates: Contact us to schedule.
  • Course Category: AI / Machine Learning Duration: 2 Days Audience: This course is geared for those who wants to unlock the complexities of machine learning algorithms in Spark to generate useful data insights through this data analysis tutorial

Course Snapshot 

  • Duration: 2 days 
  • Skill-level: Foundation-level Machine Learning with Spark skills for Intermediate skilled team members. This is not a basic class. 
  • Targeted Audience: This course is geared for those who wants to unlock the complexities of machine learning algorithms in Spark to generate useful data insights through this data analysis tutorial 
  • Hands-on Learning: This course is approximately 50% hands-on lab to 50% lecture ratio, combining engaging lecture, demos, group activities and discussions with machine-based student labs and exercises. Student machines are required. 
  • Delivery Format: This course is available for onsite private classroom presentation. 
  • Customizable: This course may be tailored to target your specific training skills objectives, tools of choice and learning goals. 

The purpose of machine learning is to build systems that learn from data. Being able to understand trends and patterns in complex data is critical to success; it is one of the key strategies to unlock growth in the challenging contemporary marketplace today. With the meteoric rise of machine learning, developers are now keen on finding out how can they make their Spark applications smarter. 

This course gives you access to transform data into actionable knowledge. The course commences by defining machine learning primitives by the MLlib and H2O libraries. You will learn how to use Binary classification to detect the Higgs Boson particle in the huge amount of data produced by CERN particle collider and classify daily health activities using ensemble Methods for Multi-Class Classification. Next, you will solve a typical regression problem involving flight delay predictions and write sophisticated Spark pipelines. You will analyze Twitter data with help of the doc2vec algorithm and K-means clustering. Finally, you will build different pattern mining models using MLlib, perform complex manipulation of DataFrames using Spark and Spark SQL, and deploy your app in a Spark streaming environment. 

Working in a hands-on learning environment, led by our Machine Learning with Spark expert instructor, students will learn about and explore: 

  • Process and analyze big data in a distributed and scalable way 
  • Write sophisticated Spark pipelines that incorporate elaborate extraction 
  • Build and use regression models to predict flight delays 

Topics Covered: This is a high-level list of topics covered in this course. Please see the detailed Agenda below 

  • Use Spark streams to cluster tweets online 
  • Run the PageRank algorithm to compute user influence 
  • Perform complex manipulation of DataFrames using Spark 
  • Define Spark pipelines to compose individual data transformations 
  • Utilize generated models for off-line/on-line prediction 
  • Transfer the learning from an ensemble to a simpler Neural Network 
  • Understand basic graph properties and important graph operations 
  • Use GraphFrames, an extension of DataFrames to graphs, to study graphs using an elegant query language 
  • Use K-means algorithm to cluster movie reviews dataset 

Audience & Pre-Requisites 

This course is designed for developers interested to unlock the complexities of machine learning algorithms in Spark to generate useful data insights through this data analysis tutorial 

Pre-Requisites:  Students should have familiar with  

  • Basics of Python  
  • Knowledge of Python is assumed. 

Course Agenda / Topics 

  1. Introduction to Large-Scale Machine Learning and Spark 
  • Introduction to Large-Scale Machine Learning and Spark 
  • Data science 
  • The sexiest role of the 21st century – data scientist? 
  • Introducing H2O.ai 
  • What’s the difference between H2O and Spark’s MLlib? 
  • Data munging 
  • Data science – an iterative process 
  1. Detecting Dark Matter – The Higgs-Boson Particle 
  • Detecting Dark Matter – The Higgs-Boson Particle 
  • Type I versus type II error 
  • Spark start and data load 
  1. Ensemble Methods for Multi-Class Classification 
  • Ensemble Methods for Multi-Class Classification 
  • Data 
  • Modeling goal 
  1. Predicting Movie Reviews Using NLP and Spark Streaming 
  • Predicting Movie Reviews Using NLP and Spark Streaming 
  • NLP – a brief primer 
  • The dataset 
  • Feature extraction 
  • Featurization – feature hashing 
  • Let’s do some (model) training! 
  • Super learner 
  1. Word2vec for Prediction and Clustering 
  • Word2vec for Prediction and Clustering 
  • Motivation of word vectors 
  • Word2vec explained 
  • Doc2vec explained 
  • Applying word2vec and exploring our data with vectors 
  • Creating document vectors 
  • Supervised learning task 
  1. Extracting Patterns from Clickstream Data 
  • Extracting Patterns from Clickstream Data 
  • Frequent pattern mining 
  • Pattern mining with Spark MLlib 
  • Deploying a pattern mining application 
  1. Graph Analytics with GraphX 
  • Graph Analytics with GraphX 
  • Basic graph theory 
  • GraphX distributed graph processing engine 
  • Graph algorithms and applications 
  1. Lending Club Loan Prediction 
  • Lending Club Loan Prediction 
  • Motivation 
  • Preparation of the environment 
  • Data load 
  • Exploration – data analysis 
View All Courses

    Course Inquiry

    Fill in the details below and we will get back to you as quickly as we can.

    Interested in any of these related courses?