Machine Learning with R Cookbook

Home » Technology » Big Data & Data Science » Machine Learning with R Cookbook

Course Skill Level:

Foundational

Course Duration:

4 day/s

Course Delivery Format:

Live, instructor-led.
Course Category:

Big Data & Data Science
Course Code:

MLWRCBL21E09

Who should attend & recommended skills:

Developers with basic Python skills

Who should attend & recommended skills

This course is designed for developers interested to Explore over 110 recipes to analyze data and build predictive models with simple and easy-to-use R code.
Skill-level: Foundation-level Machine Learning with R Cookbook skills for Intermediate skilled team members. This is not a basic class.
Python: Basic (1-2 years’ experience)

About this course

Big data has become a popular buzzword across many industries. An increasing number of people have been exposed to the term and are looking at how to leverage big data in their own businesses, to improve sales and profitability. However, collecting, aggregating, and visualizing data is just one part of the equation. Being able to extract useful information from data is another task, and a much more challenging one. Machine Learning with R Cookbook, Second Edition uses a practical approach to teach you how to perform machine learning with R. Each chapter is divided into several simple recipes. Through the step-by-step instructions provided in each recipe, you will be able to construct a predictive model by using a variety of machine learning packages. In this book, you will first learn to set up the R environment and use simple R commands to explore data. The next topic covers how to perform statistical analysis with machine learning analysis and assess created models, covered in detail later on in the book. You’ll also learn how to integrate R and Hadoop to create a big data analysis platform. The detailed illustrations provide all the information required to start applying machine learning to individual projects. With Machine Learning with R Cookbook, machine learning has never been easier.

Skills acquired & topics covered

Working in a hands-on learning environment, led by our ML expert instructor, students will learn about and explore:
Applying R to simplify predictive modeling with short and simple code
Using machine learning to solve problems ranging from small to big data
Building a training and testing dataset, applying different classification methods.
Creating and inspecting transaction datasets and perform association analysis with the Apriori algorithm
Visualizing patterns and associations using a range of graphs and find frequent item-sets using the Eclat algorithm
Comparing differences between each regression method to discover how they solve problems
Detecting and imputing missing values in air quality data
Predicting possible churn users with the classification approach
Plotting the autocorrelation function with time series analysis
Using the Cox proportional hazards model for survival analysis
Implementing the clustering method to segment customer data
Compressing images with the dimension reduction method
Incorporating R and Hadoop to solve machine learning problems on big data

Course breakdown / modules

Introduction
Downloading and installing R
Downloading and installing RStudio
Installing and loading packages
Understanding of basic data structures
Basic commands for subsetting
Reading and writing data
Manipulating data
Applying basic statistics
Visualizing data
Getting a dataset for machine learning

Introduction
Using air quality dataset
Converting attributes to factor
Detecting missing values
Imputing missing values
Exploring and visualizing data
Predicting values from datasets

Introduction
Looking at time series data
Plotting and forecasting time series data
Extracting, subsetting, merging, filling, and padding
Successive differences and moving averages
Exponential smoothing
Plotting the autocorrelation function

Introduction
Understanding data sampling in R
Operating a probability distribution in R
Working with univariate descriptive statistics in R
Performing correlations and multivariate analysis
Conducting an exact binomial test
Performing a student’s t-test
Performing the Kolmogorov-Smirnov test
Understanding the Wilcoxon Rank Sum and Signed Rank test
Working with Pearson’s Chi-squared test
Conducting a one-way ANOVA
Performing a two-way ANOVA

Introduction
Different types of regression
Fitting a linear regression model with lm
Summarizing linear model fits
Using linear regression to predict unknown values
Generating a diagnostic plot of a fitted model
Fitting multiple regression
Summarizing multiple regression
Using multiple regression to predict unknown values
Fitting a polynomial regression model with lm
Fitting a robust linear regression model with rlm
Studying a case of linear regression on SLID data
Applying the Gaussian model for generalized linear regression
Applying the Poisson model for generalized linear regression
Applying the Binomial model for generalized linear regression
Fitting a generalized additive model to data
Visualizing a generalized additive model
Diagnosing a generalized additive model

Introduction
Loading and observing data
Viewing the summary of survival analysis
Visualizing the Survival Curve
Using the log-rank test
Using the COX proportional hazard model
Nelson-Aalen Estimator of cumulative hazard

Introduction
Preparing the training and testing datasets
Building a classification model with recursive partitioning trees
Visualizing a recursive partitioning tree
Measuring the prediction performance of a recursive partitioning tree
Pruning a recursive partitioning tree
Handling missing data and split and surrogate variables
Building a classification model with a conditional inference tree
Control parameters in conditional inference trees
Visualizing a conditional inference tree
Measuring the prediction performance of a conditional inference tree
Classifying data with the k-nearest neighbor classifier
Classifying data with logistic regression
Classifying data with the Nave Bayes classifier

Introduction
Choosing the cost of a support vector machine
Visualizing an SVM fit
Predicting labels based on a model trained by a support vector machine
Tuning a support vector machine
The basics of neural network
Training a neural network with neuralnet
Visualizing a neural network trained by neuralnet
Predicting labels based on a model trained by neuralnet
Training a neural network with nnet
Predicting labels based on a model trained by nnet

Introduction
Estimating model performance with k-fold cross-validation
Estimating model performance with Leave One Out Cross Validation
Performing cross-validation with the e1071 package
Performing cross-validation with the caret package
Ranking the variable importance with the caret package
Ranking the variable importance with the rminer package
Finding highly correlated features with the caret package
Selecting features using the caret package
Measuring the performance of the regression model
Measuring prediction performance with a confusion matrix
Measuring prediction performance using ROCR
Comparing an ROC curve using the caret package
Measuring performance differences between models with the caret package

Introduction
Using the Super Learner algorithm
Using ensemble to train and test
Classifying data with the bagging method
Performing cross-validation with the bagging method
Classifying data with the boosting method
Performing cross-validation with the boosting method
Classifying data with gradient boosting
Calculating the margins of a classifier
Calculating the error evolution of the ensemble method
Classifying data with random forest
Estimating the prediction errors of different classifiers

Introduction
Clustering data with hierarchical clustering
Cutting trees into clusters
Clustering data with the k-means method
Drawing a bivariate cluster plot
Comparing clustering methods
Extracting silhouette information from clustering
Obtaining the optimum number of clusters for k-means
Clustering data with the density-based method
Clustering data with the model-based method
Visualizing a dissimilarity matrix
Validating clusters externally

Introduction
Transforming data into transactions
Displaying transactions and associations
Mining associations with the Apriori rule
Pruning redundant rules
Visualizing association rules
Mining frequent itemsets with Eclat
Creating transactions with temporal information
Mining frequent sequential patterns with cSPADE
Using the TraMineR package for sequence analysis
Visualizing sequence, Chronogram, and Traversal Statistics

Introduction
Why to reduce the dimension?
Performing feature selection with FSelector
Performing dimension reduction with PCA
Determining the number of principal components using the scree test
Determining the number of principal components using the Kaiser method
Visualizing multivariate data using biplot
Performing dimension reduction with MDS
Reducing dimensions with SVD
Compressing images with SVD
Performing nonlinear dimension reduction with ISOMAP
Performing nonlinear dimension reduction with Local Linear
Embedding

Introduction
Preparing the RHadoop environment
Installing rmr2
Installing rhdfs
Operating HDFS with rhdfs
Implementing a word count problem with RHadoop
Comparing the performance between an R MapReduce program and a standard R program
Testing and debugging the rmr2 program
Installing plyrmr
Manipulating data with plyrmr
Conducting machine learning with RHadoop

Free Training Courses

Leadership & Professional Development Courses

Microsoft Office Courses

Technology Courses

Machine Learning with R Cookbook

Who should attend & recommended skills

About this course

Skills acquired & topics covered

Course breakdown / modules

Browse our programs to take the next step toward advancing yourself, your team, and organization.

Free Training Courses

Leadership & Professional Development Courses

Microsoft Office Courses

Technology Courses

Let us help you find the training program you are looking for.

Machine Learning with R Cookbook

Who should attend & recommended skills

About this course

Skills acquired & topics covered

Course breakdown / modules

Practical Machine Learning with R

Data Exploration with Air Quality Datasets

Analyzing Time Series Data

R and Statistics

Understanding Regression Analysis

Survival Analysis

Classification 1 - Tree, Lazy, and Probabilistic

Classification 2 - Neural Network and SVM

Model Evaluation

Ensemble Learning

Clustering

Association Analysis and Sequence Mining

Dimension Reduction

Big Data Analysis (R and Hadoop)

Browse our programs to take the next step toward advancing yourself, your team, and organization.

View Course Detail