- Duration: 3 days
- Skill-level: Foundation-level Spark GraphX skills for Intermediate skilled team members. This is not a basic class.
- Targeted Audience: This course is geared for those who wants to know how to configure GraphX and how to use it interactively
- Hands-on Learning: This course is approximately 50% hands-on lab to 50% lecture ratio, combining engaging lecture, demos, group activities and discussions with machine-based student labs and exercises. Student machines are required.
- Delivery Format: This course is available for onsite private classroom presentation.
- Customizable: This course may be tailored to target your specific training skills objectives, tools of choice and learning goals.
Spark GraphX begins with the big picture of what graphs can be used for. This example-based tutorial teaches you how to use GraphX interactively. You’ll start with a crystal-clear introduction to building big data graphs from regular data, and then explore the problems and possibilities of implementing graph algorithms and architecting graph processing pipelines. Along the way, you’ll collect practical techniques for enhancing applications and applying machine learning algorithms to graph data.
Working in a hands-on learning environment, led by our GraphX expert instructor, students will learn about and explore:
- GraphX is a powerful graph processing API for the Apache Spark analytics engine that lets you draw insights from large datasets.
- GraphX gives you unprecedented speed and capacity for running massively parallel and machine learning algorithms.
Topics Covered: This is a high-level list of topics covered in this course. Please see the detailed Agenda below
- Understanding graph technology
- Using the GraphX API
- Developing algorithms for big graphs
- Machine learning with graphs
- Graph visualization
Audience & Pre-Requisites
This course is geared for attendees who want to collect practical techniques for enhancing applications and applying machine learning algorithms to graph data.
Pre-Requisites: Students should have
- Basic to Intermediate IT Skills
- Readers should be comfortable writing code.
- Experience with Apache Spark and Scala is not required.
Course Agenda / Topics
- Two important technologies: Spark and graphsfree
- Spark: the step beyond Hadoop MapReduce
- Graphs: finding meaning from relationships
- Putting them together for lightning fast graph processing: Spark GraphX
- GraphX quick start
- Getting set up and getting data
- Interactive GraphX querying using the Spark Shell
- PageRank example
- Some fundamentals
- Scala, the native language of Spark
- Graph terminology
- GraphX Basics
- Vertex and edge classes
- Mapping operations
- Graph generation
- Pregel API
- Built-in algorithms
- Seek out authoritative nodes: PageRank
- Measuring connectedness: Triangle Count
- Find the fewest hops: ShortestPaths
- Finding isolated populations: Connected Components
- Community detection: LabelPropagation
- Other useful graph algorithmsfree
- Your own GPS: Shortest Paths with Weights
- Travelling Salesman: greedy algorithm
- Route utilities: Minimum Spanning Trees
- Machine learning
- Supervised, unsupervised, and semi-supervised learning
- Recommend a movie: SVDPlusPlus
- Using GraphX With MLlib
- Poor man’s training data: graph-based semi-supervised learning
- The missing algorithms
- Missing basic graph operations
- Reading RDF graph files
- Poor man’s graph isomorphism: finding missing Wikipedia infobox items
- Global clustering coefficient: compare connectedness
- Performance and monitoring
- Monitoring your Spark application
- Configuring Spark
- Spark performance tuning
- Graph partitioning
- Other languages and tools
- Using languages other than Scala with GraphX
- Another visualization tool: Apache Zeppelin plus d3.js
- Almost a database: Spark Job Server
- Using SQL with Spark graphs with GraphFrames
Student Materials: Each student will receive a Student Guide with course notes, code samples, software tutorials, diagrams and related reference materials and links (as applicable). Our courses also include step by step hands-on lab instructions and and solutions, clearly illustrated for users to complete hands-on work in class, and to revisit to review or refresh skills at any time. Students will also receive the project files (or code, if applicable) and solutions required for the hands-on work.