Let us help you find the training program you are looking for.

If you can't find what you are looking for, contact us, we'll help you find it. We have over 800 training programs to choose from.


Course Skill Level:


Course Duration:

4 day/s

  • Course Delivery Format:

    Live, instructor-led.

  • Course Category:

    AI / Machine Learning

  • Course Code:


Who should attend & recommended skills:

Developers familiar with Java and basic programming experience

Who should attend & recommended skills

  • This course is geared for developers familiar with Java who want practical use cases and an illustration on how Mahout can be applied to solve them.
  • Skill-level: Foundation-Mahout skills for Intermediate skilled team members. This is not a basic class.
  • Programming: Basic (1-3 years) required
  • Mahout: No experience required

About this course

This course covers machine learning using Apache Mahout. Based on experience with real-world applications, it introduces practical use cases and illustrates how Mahout can be applied to solve them. It places particular focus on issues of scalability and how to apply these techniques against large data sets using the Apache Hadoop framework.

Skills acquired & topics covered

  • Working in a hands-on learning environment, led by our Mahout expert instructor, students will learn about and explore:
  • An introduction to machine learning with Apache Mahout.
  • Real-world examples
  • Practical use cases and then illustrates how Mahout can be applied to solve them
  • Using group data to make individual recommendations
  • Finding logical clusters within your data
  • Filtering and refine with on-the-fly classification

Course breakdown / modules

  • Defining recommendation
  • Running a first recommender engine
  • Evaluating a recommender
  • Evaluating precision and recall
  • Evaluating the GroupLens data set

  • Representing preference data
  • In-memory DataModels
  • Coping without preference values

  • Understanding user-based recommendation
  • Exploring the user-based recommender
  • Exploring similarity metrics
  • Item-based recommendation
  • Slope-one recommender
  • New and experimental recommenders
  • Comparison to other recommenders
  • Comparison to model-based recommenders

  • Analyzing example data from a dating site
  • Finding an effective recommender
  • Injecting domain-specific information
  • Recommending to anonymous users
  • Creating a web-enabled
  • Updating and monitoring the recommender

  • Analyzing the Wikipedia data set
  • Designing a distributed item-based algorithm
  • Implementing a distributed algorithm with MapReduce
  • Running MapReduces with Hadoop
  • Pseudo-distributing a recommender
  • Looking beyond first steps with recommendations

  • Clustering basics
  • Measuring the similarity of items
  • Hello World: running a simple clustering example
  • Exploring distance measures
  • Hello World again! Trying out various distance measures

  • Visualizing vectors
  • Representing text documents as vectors
  • Generating vectors from documents
  • Improving quality of vectors using normalization

  • K-means clustering
  • Beyond k-means: an overview of clustering techniques
  • Fuzzy k-means clustering
  • Model-based clustering
  • Topic modeling using latent Dirichlet allocation (LDA)

  • Inspecting clustering output
  • Analyzing clustering output
  • Improving clustering quality

  • Quick-start tutorial for running clustering on Hadoop
  • Tuning clustering performance
  • Batch and online clustering

  • Finding similar users on Twitter
  • Suggesting tags for artists on Last.fm
  • Analyzing the Stack Overflow data set

  • Why use Mahout for classification?
  • The fundamentals of classification systems
  • How classification works
  • Work flow in a typical classification project
  • Step-by-step simple classification example

  • Extracting features to build a Mahout classifier
  • Preprocessing raw data into classifiable data
  • Converting classifiable data into vectors
  • Classifying the 20 newsgroups data set with SGD
  • Choosing an algorithm to train the classifier
  • Classifying the 20 newsgroups data with naive Bayes

  • Classifier evaluation in Mahout
  • The classifier evaluation API
  • When classifiers go bad
  • Tuning for better performance

  • Process for deployment in huge systems
  • Determining scale and speed requirements
  • Building a training pipeline for large systems
  • Integrating a Mahout classifier
  • Example: a Thrift-based classification server

  • Why Shop It To Me chose Mahout
  • General structure of the email marketing system
  • Training the model
  • Speeding up classification