Let us help you find the training program you are looking for.

If you can't find what you are looking for, contact us, we'll help you find it. We have over 800 training programs to choose from.


  • Course Code: Artificial Intelligence - Mahout
  • Course Dates: Contact us to schedule.
  • Course Category: AI / Machine Learning Duration: 4 Days Audience: This course is geared for those who wants practical use cases and then illustrates how Mahout can be applied to solve them.

Course Snapshot 

  • Duration: 4 days 
  • Skill-level: Foundation-Mahout skills for Intermediate skilled team members. This is not a basic class. 
  • Targeted Audience: This course is geared for those who wants practical use cases and then illustrates how Mahout can be applied to solve them. 
  • Hands-on Learning: This course is approximately 50% hands-on lab to 50% lecture ratio, combining engaging lecture, demos, group activities and discussions with machine-based student labs and exercises. Student machines are required. 
  • Delivery Format: This course is available for onsite private classroom presentation, or remote instructor led delivery, or CBT/WBT (by request). 
  • Customizable: This course may be tailored to target your specific training skills objectives, tools of choice and learning goals. 

This course covers machine learning using Apache Mahout. Based on experience with real-world applications, it introduces practical use cases and illustrates how Mahout can be applied to solve them. It places particular focus on issues of scalability and how to apply these techniques against large data sets using the Apache Hadoop framework. 

Working in a hands-on learning environment, led by our Mahout expert instructor, students will learn about and explore: 

  • introduction to machine learning with Apache Mahout. 
  • Real-world examples 
  • the course presents practical use cases and then illustrates how Mahout can be applied to solve them 

Topics Covered: This is a high-level list of topics covered in this course. Please see the detailed Agenda below 

  • Use group data to make individual recommendations 
  • Find logical clusters within your data 
  • Filter and refine with on-the-fly classification 

Audience & Pre-Requisites 

This course is written for developers familiar with Java.  

Pre-Requisites:  Students should have  

  • Basic Programming knowledge is needed.  
  • Good foundational mathematics or logic skills 
  • No prior experience with Mahout is assumed 

Course Agenda / Topics 

  1. Meet Apache Mahoutfree 
  1. Introducing recommenders 
  • Defining recommendation 
  • Running a first recommender engine 
  • Evaluating a recommender 
  • Evaluating precision and recall 
  • Evaluating the GroupLens data set 
  1. Representing recommender data 
  • Representing preference data 
  • In-memory DataModels 
  • Coping without preference values 
  1. Making recommendations 
  • Understanding user-based recommendation 
  • Exploring the user-based recommender 
  • Exploring similarity metrics 
  • Item-based recommendation 
  • Slope-one recommender 
  • New and experimental recommenders 
  • Comparison to other recommenders 
  • Comparison to model-based recommenders 
  1. Taking recommenders to production 
  • Analyzing example data from a dating site 
  • Finding an effective recommender 
  • Injecting domain-specific information 
  • Recommending to anonymous users 
  • Creating a web-enabled  
  • Updating and monitoring the recommender 
  1. Distributing recommendation computations 
  • Analyzing the Wikipedia data set 
  • Designing a distributed item-based algorithm 
  • Implementing a distributed algorithm with MapReduce 
  • Running MapReduces with Hadoop 
  • Pseudo-distributing a recommender 
  • Looking beyond first steps with recommendations 
  1. Introduction to clustering 
  • Clustering basics 
  • Measuring the similarity of items 
  • Hello World: running a simple clustering example 
  • Exploring distance measures 
  • Hello World again! Trying out various distance measures 
  1. Representing datafree 
  • Visualizing vectors 
  • Representing text documents as vectors 
  • Generating vectors from documents 
  • Improving quality of vectors using normalization 
  1. Clustering algorithms in Mahout 
  • K-means clustering 
  • Beyond k-means: an overview of clustering techniques 
  • Fuzzy k-means clustering 
  • Model-based clustering 
  • Topic modeling using latent Dirichlet allocation (LDA) 
  1. Evaluating and improving clustering quality 
  • Inspecting clustering output 
  • Analyzing clustering output 
  • Improving clustering quality 
  1. Taking clustering to production 
  • Quick-start tutorial for running clustering on Hadoop 
  • Tuning clustering performance 
  • Batch and online clustering 
  1. Real-world applications of clustering 
  • Finding similar users on Twitter 
  • Suggesting tags for artists on Last.fm 
  • Analyzing the Stack Overflow data set 
  1. Introduction to classification 
  • Why use Mahout for classification? 
  • The fundamentals of classification systems 
  • How classification works 
  • Work flow in a typical classification project 
  • Step-by-step simple classification example 
  1. Training a classifier 
  • Extracting features to build a Mahout classifier 
  • Preprocessing raw data into classifiable data 
  • Converting classifiable data into vectors 
  • Classifying the 20 newsgroups data set with SGD 
  • Choosing an algorithm to train the classifier 
  • Classifying the 20 newsgroups data with naive Bayes 
  1. Evaluating and tuning a classifier 
  • Classifier evaluation in Mahout 
  • The classifier evaluation API 
  • When classifiers go bad 
  • Tuning for better performance 
  1. Deploying a classifier 
  • Process for deployment in huge systems 
  • Determining scale and speed requirements 
  • Building a training pipeline for large systems 
  • Integrating a Mahout classifier 
  • Example: a Thrift-based classification server 
  1. Case study: Shop It To Me 
  • Why Shop It To Me chose Mahout 
  • General structure of the email marketing system 
  • Training the model 
  • Speeding up classification 
View All Courses

    Course Inquiry

    Fill in the details below and we will get back to you as quickly as we can.

    Interested in any of these related courses?