Let us help you find the training program you are looking for.

If you can't find what you are looking for, contact us, we'll help you find it. We have over 800 training programs to choose from.

Machine Learning with R Cookbook

  • Course Code: Data Science - Machine Learning with R Cookbook
  • Course Dates: Contact us to schedule.
  • Course Category: Big Data & Data Science Duration: 4 Days Audience: This course is geared for those who wants to Explore over 110 recipes to analyze data and build predictive models with simple and easy-to-use R code.

Course Snapshot 

  • Duration: 4 days 
  • Skill-level: Foundation-level Machine Learning with R Cookbook skills for Intermediate skilled team members. This is not a basic class. 
  • Targeted Audience: This course is geared for those who wants to Explore over 110 recipes to analyze data and build predictive models with simple and easy-to-use R code. 
  • Hands-on Learning: This course is approximately 50% hands-on lab to 50% lecture ratio, combining engaging lecture, demos, group activities and discussions with machine-based student labs and exercises. Student machines are required. 
  • Delivery Format: This course is available for onsite private classroom presentation. 
  • Customizable: This course may be tailored to target your specific training skills objectives, tools of choice and learning goals. 

Big data has become a popular buzzword across many industries. An increasing number of people have been exposed to the term and are looking at how to leverage big data in their own businesses, to improve sales and profitability. However, collecting, aggregating, and visualizing data is just one part of the equation. Being able to extract useful information from data is another task, and a much more challenging one. Machine Learning with R Cookbook, Second Edition uses a practical approach to teach you how to perform machine learning with R. Each chapter is divided into several simple recipes. Through the step-by-step instructions provided in each recipe, you will be able to construct a predictive model by using a variety of machine learning packages. In this book, you will first learn to set up the R environment and use simple R commands to explore data. The next topic covers how to perform statistical analysis with machine learning analysis and assess created models, covered in detail later on in the book. You’ll also learn how to integrate R and Hadoop to create a big data analysis platform. The detailed illustrations provide all the information required to start applying machine learning to individual projects. With Machine Learning with R Cookbook, machine learning has never been easier. 

Working in a hands-on learning environment, led by our ML expert instructor, students will learn about and explore: 

  • Apply R to simplify predictive modeling with short and simple code 
  • Use machine learning to solve problems ranging from small to big data 
  • Build a training and testing dataset, applying different classification methods. 

Topics Covered: This is a high-level list of topics covered in this course. Please see the detailed Agenda below 

  • Create and inspect transaction datasets and perform association analysis with the Apriori algorithm 
  • Visualize patterns and associations using a range of graphs and find frequent item-sets using the Eclat algorithm 
  • Compare differences between each regression method to discover how they solve problems 
  • Detect and impute missing values in air quality data 
  • Predict possible churn users with the classification approach 
  • Plot the autocorrelation function with time series analysis 
  • Use the Cox proportional hazards model for survival analysis 
  • Implement the clustering method to segment customer data 
  • Compress images with the dimension reduction method 
  • Incorporate R and Hadoop to solve machine learning problems on big data 

Audience & Pre-Requisites 

This course is designed for developers interested to Explore over 110 recipes to analyze data and build predictive models with simple and easy-to-use R code 

Pre-Requisites:  Students should have familiar with  

  • Basics of Python  
  • Knowledge of Python is assumed. 

Course Agenda / Topics 

  1. Practical Machine Learning with R 
  • Practical Machine Learning with R 
  • Introduction 
  • Downloading and installing R 
  • Downloading and installing RStudio 
  • Installing and loading packages 
  • Understanding of basic data structures 
  • Basic commands for subsetting 
  • Reading and writing data 
  • Manipulating data 
  • Applying basic statistics 
  • Visualizing data 
  • Getting a dataset for machine learning 
  1. Data Exploration with Air Quality Datasets 
  • Data Exploration with Air Quality Datasets 
  • Introduction 
  • Using air quality dataset 
  • Converting attributes to factor 
  • Detecting missing values 
  • Imputing missing values 
  • Exploring and visualizing data 
  • Predicting values from datasets 
  1. Analyzing Time Series Data 
  • Analyzing Time Series Data 
  • Introduction 
  • Looking at time series data 
  • Plotting and forecasting time series data 
  • Extracting, subsetting, merging, filling, and padding
  • Successive differences and moving averages 
  • Exponential smoothing 
  • Plotting the autocorrelation function 
  1. R and Statistics 
  • R and Statistics 
  • Introduction 
  • Understanding data sampling in R 
  • Operating a probability distribution in R 
  • Working with univariate descriptive statistics in R 
  • Performing correlations and multivariate analysis 
  • Conducting an exact binomial test 
  • Performing a student’s t-test 
  • Performing the Kolmogorov-Smirnov test 
  • Understanding the Wilcoxon Rank Sum and Signed Rank test 
  • Working with Pearson’s Chi-squared test 
  • Conducting a one-way ANOVA 
  • Performing a two-way ANOVA 
  1. Understanding Regression Analysis 
  • Understanding Regression Analysis 
  • Introduction 
  • Different types of regression 
  • Fitting a linear regression model with lm 
  • Summarizing linear model fits 
  • Using linear regression to predict unknown values 
  • Generating a diagnostic plot of a fitted model 
  • Fitting multiple regression 
  • Summarizing multiple regression 
  • Using multiple regression to predict unknown values 
  • Fitting a polynomial regression model with lm 
  • Fitting a robust linear regression model with rlm 
  • Studying a case of linear regression on SLID data 
  • Applying the Gaussian model for generalized linear regression 
  • Applying the Poisson model for generalized linear regression 
  • Applying the Binomial model for generalized linear regression 
  • Fitting a generalized additive model to data 
  • Visualizing a generalized additive model 
  • Diagnosing a generalized additive model 
  1. Survival Analysis 
  • Survival Analysis 
  • Introduction 
  • Loading and observing data 
  • Viewing the summary of survival analysis 
  • Visualizing the Survival Curve 
  • Using the log-rank test 
  • Using the COX proportional hazard model 
  • Nelson-Aalen Estimator of cumulative hazard 
  1. Classification 1 – Tree, Lazy, and Probabilistic 
  • Classification 1 – Tree, Lazy, and Probabilistic 
  • Introduction 
  • Preparing the training and testing datasets 
  • Building a classification model with recursive partitioning trees 
  • Visualizing a recursive partitioning tree 
  • Measuring the prediction performance of a recursive partitioning tree 
  • Pruning a recursive partitioning tree 
  • Handling missing data and split and surrogate variables 
  • Building a classification model with a conditional inference tree 
  • Control parameters in conditional inference trees 
  • Visualizing a conditional inference tree 
  • Measuring the prediction performance of a conditional inference tree 
  • Classifying data with the k-nearest neighbor classifier 
  • Classifying data with logistic regression 
  • Classifying data with the Naïve Bayes classifier 
  1. Classification 2 – Neural Network and SVM 
  • Classification 2 – Neural Network and SVM 
  • Introduction 
  • Classifying data with a support vector machine 
  • Choosing the cost of a support vector machine 
  • Visualizing an SVM fit 
  • Predicting labels based on a model trained by a support vector machine 
  • Tuning a support vector machine 
  • The basics of neural network 
  • Training a neural network with neuralnet 
  • Visualizing a neural network trained by neuralnet 
  • Predicting labels based on a model trained by neuralnet 
  • Training a neural network with nnet 
  • Predicting labels based on a model trained by nnet 
  1. Model Evaluation 
  • Model Evaluation 
  • Introduction 
  • Estimating model performance with k-fold cross-validation 
  • Estimating model performance with Leave One Out Cross Validation 
  • Performing cross-validation with the e1071 package 
  • Performing cross-validation with the caret package 
  • Ranking the variable importance with the caret package 
  • Ranking the variable importance with the rminer package 
  • Finding highly correlated features with the caret package 
  • Selecting features using the caret package 
  • Measuring the performance of the regression model 
  • Measuring prediction performance with a confusion matrix 
  • Measuring prediction performance using ROCR 
  • Comparing an ROC curve using the caret package 
  • Measuring performance differences between models with the caret package 
  1. Ensemble Learning 
  • Ensemble Learning 
  • Introduction 
  • Using the Super Learner algorithm 
  • Using ensemble to train and test 
  • Classifying data with the bagging method 
  • Performing cross-validation with the bagging method 
  • Classifying data with the boosting method 
  • Performing cross-validation with the boosting method 
  • Classifying data with gradient boosting 
  • Calculating the margins of a classifier 
  • Calculating the error evolution of the ensemble method 
  • Classifying data with random forest 
  • Estimating the prediction errors of different classifiers 
  1. Clustering 
  • Clustering 
  • Introduction 
  • Clustering data with hierarchical clustering 
  • Cutting trees into clusters 
  • Clustering data with the k-means method 
  • Drawing a bivariate cluster plot 
  • Comparing clustering methods 
  • Extracting silhouette information from clustering 
  • Obtaining the optimum number of clusters for k-means 
  • Clustering data with the density-based method 
  • Clustering data with the model-based method 
  • Visualizing a dissimilarity matrix 
  • Validating clusters externally 
  1. Association Analysis and Sequence Mining 
  • Association Analysis and Sequence Mining 
  • Introduction 
  • Transforming data into transactions 
  • Displaying transactions and associations 
  • Mining associations with the Apriori rule 
  • Pruning redundant rules 
  • Visualizing association rules 
  • Mining frequent itemsets with Eclat 
  • Creating transactions with temporal information 
  • Mining frequent sequential patterns with cSPADE 
  • Using the TraMineR package for sequence analysis 
  • Visualizing sequence, Chronogram, and Traversal Statistics 
  1. Dimension Reduction 
  • Dimension Reduction 
  • Introduction 
  • Why to reduce the dimension? 
  • Performing feature selection with FSelector 
  • Performing dimension reduction with PCA 
  • Determining the number of principal components using the scree test 
  • Determining the number of principal components using the Kaiser method 
  • Visualizing multivariate data using biplot 
  • Performing dimension reduction with MDS 
  • Reducing dimensions with SVD 
  • Compressing images with SVD 
  • Performing nonlinear dimension reduction with ISOMAP 
  • Performing nonlinear dimension reduction with Local Linear Embedding 
  1. Big Data Analysis (R and Hadoop) 
  • Big Data Analysis (R and Hadoop) 
  • Introduction 
  • Preparing the RHadoop environment 
  • Installing rmr2 
  • Installing rhdfs 
  • Operating HDFS with rhdfs 
  • Implementing a word count problem with RHadoop 
  • Comparing the performance between an R MapReduce program and a standard R program 
  • Testing and debugging the rmr2 program 
  • Installing plyrmr 
  • Manipulating data with plyrmr 
  • Conducting machine learning with RHadoop 
  • Configuring RHadoop clusters on Amazon EMR 

View All Courses

    Course Inquiry

    Fill in the details below and we will get back to you as quickly as we can.

    Interested in any of these related courses?