Let us help you find the training program you are looking for.

If you can't find what you are looking for, contact us, we'll help you find it. We have over 800 training programs to choose from.


Course Skill Level:

Foundational to Intermediate

Course Duration:

3 day/s

  • Course Delivery Format:

    Live, instructor-led.

  • Course Category:

    Big Data & Data Science

  • Course Code:


Who should attend & recommended skills:

Those with basic IT and code writing experience

Who should attend & recommended skills

  • Those who want to collect practical techniques for enhancing applications and applying machine learning algorithms to graph data and learn how to configure GraphX and how to use it interactively.
  • Skill-level: Foundation-level Spark GraphX skills for Intermediate skilled team members. This is not a basic class.
  • IT skills: Basic to Intermediate (1-5 years’ experience)
  • Writing code: Basic (1-2 years’ experience)
  • Apache Spark and Scala experience is not required.

About this course

Spark GraphX begins with the big picture of what graphs can be used for. This example-based tutorial teaches you how to use GraphX interactively. You’ll start with a crystal-clear introduction to building big data graphs from regular data, and then explore the problems and possibilities of implementing graph algorithms and architecting graph processing pipelines. Along the way, you’ll collect practical techniques for enhancing applications and applying machine learning algorithms to graph data.

Skills acquired & topics covered

  • GraphX, a powerful graph processing API for the Apache Spark analytics engine that lets you draw insights from large datasets.
  • Unprecedented speed and capacity for running massively parallel and machine learning algorithms.
  • Understanding graph technology
  • Using the GraphX API
  • Developing algorithms for big graphs
  • Machine learning with graphs
  • Graph visualization

Course breakdown / modules

  • Spark: the step beyond Hadoop MapReduce
  • Graphs: finding meaning from relationships
  • Putting them together for lightning fast graph processing: Spark GraphX

  • Getting set up and getting data
  • Interactive GraphX querying using the Spark Shell
  • PageRank example

  • Scala, the native language of Spark
  • Spark
  • Graph terminology

  • Vertex and edge classes
  • Mapping operations
  • Serialization/deserialization
  • Graph generation
  • Pregel API

  • Seek out authoritative nodes: PageRank
  • Measuring connectedness: Triangle Count
  • Find the fewest hops: ShortestPaths
  • Finding isolated populations: Connected Components
  • Community detection: LabelPropagation

  • Your own GPS: Shortest Paths with Weights
  • Travelling Salesman: greedy algorithm
  • Route utilities: Minimum Spanning Trees

  • Supervised, unsupervised, and semi-supervised learning
  • Recommend a movie: SVDPlusPlus
  • Using GraphX With MLlib
  • Poor man’s training data: graph-based semi-supervised learning

  • Missing basic graph operations
  • Reading RDF graph files
  • Poor man’s graph isomorphism: finding missing Wikipedia infobox items
  • Global clustering coefficient: compare connectedness

  • Monitoring your Spark application
  • Configuring Spark
  • Spark performance tuning
  • Graph partitioning

  • Using languages other than Scala with GraphX
  • Another visualization tool: Apache Zeppelin plus d3.js
  • Almost a database: Spark Job Server
  • Using SQL with Spark graphs with GraphFrames