Let us help you find the training program you are looking for.

If you can't find what you are looking for, contact us, we'll help you find it. We have over 800 training programs to choose from.

banner-img

Course Skill Level:

Foundational

Course Duration:

4 day/s

  • Course Delivery Format:

    Live, instructor-led.

  • Course Category:

    Big Data & Data Science

  • Course Code:

    MLWRCBL21E09

Who should attend & recommended skills:

Developers with basic Python skills

Who should attend & recommended skills

  • This course is designed for developers interested to Explore over 110 recipes to analyze data and build predictive models with simple and easy-to-use R code.
  • Skill-level: Foundation-level Machine Learning with R Cookbook skills for Intermediate skilled team members. This is not a basic class.
  • Python: Basic (1-2 years’ experience)

About this course

Big data has become a popular buzzword across many industries. An increasing number of people have been exposed to the term and are looking at how to leverage big data in their own businesses, to improve sales and profitability. However, collecting, aggregating, and visualizing data is just one part of the equation. Being able to extract useful information from data is another task, and a much more challenging one. Machine Learning with R Cookbook, Second Edition uses a practical approach to teach you how to perform machine learning with R. Each chapter is divided into several simple recipes. Through the step-by-step instructions provided in each recipe, you will be able to construct a predictive model by using a variety of machine learning packages. In this book, you will first learn to set up the R environment and use simple R commands to explore data. The next topic covers how to perform statistical analysis with machine learning analysis and assess created models, covered in detail later on in the book. You’ll also learn how to integrate R and Hadoop to create a big data analysis platform. The detailed illustrations provide all the information required to start applying machine learning to individual projects. With Machine Learning with R Cookbook, machine learning has never been easier.

Skills acquired & topics covered

  • Working in a hands-on learning environment, led by our ML expert instructor, students will learn about and explore:
  • Applying R to simplify predictive modeling with short and simple code
  • Using machine learning to solve problems ranging from small to big data
  • Building a training and testing dataset, applying different classification methods.
  • Creating and inspecting transaction datasets and perform association analysis with the Apriori algorithm
  • Visualizing patterns and associations using a range of graphs and find frequent item-sets using the Eclat algorithm
  • Comparing differences between each regression method to discover how they solve problems
  • Detecting and imputing missing values in air quality data
  • Predicting possible churn users with the classification approach
  • Plotting the autocorrelation function with time series analysis
  • Using the Cox proportional hazards model for survival analysis
  • Implementing the clustering method to segment customer data
  • Compressing images with the dimension reduction method
  • Incorporating R and Hadoop to solve machine learning problems on big data

Course breakdown / modules

  • Introduction
  • Downloading and installing R
  • Downloading and installing RStudio
  • Installing and loading packages
  • Understanding of basic data structures
  • Basic commands for subsetting
  • Reading and writing data
  • Manipulating data
  • Applying basic statistics
  • Visualizing data
  • Getting a dataset for machine learning

  • Introduction
  • Using air quality dataset
  • Converting attributes to factor
  • Detecting missing values
  • Imputing missing values
  • Exploring and visualizing data
  • Predicting values from datasets

  • Introduction
  • Looking at time series data
  • Plotting and forecasting time series data
  • Extracting, subsetting, merging, filling, and padding
  • Successive differences and moving averages
  • Exponential smoothing
  • Plotting the autocorrelation function

  • Introduction
  • Understanding data sampling in R
  • Operating a probability distribution in R
  • Working with univariate descriptive statistics in R
  • Performing correlations and multivariate analysis
  • Conducting an exact binomial test
  • Performing a student’s t-test
  • Performing the Kolmogorov-Smirnov test
  • Understanding the Wilcoxon Rank Sum and Signed Rank test
  • Working with Pearson’s Chi-squared test
  • Conducting a one-way ANOVA
  • Performing a two-way ANOVA

  • Introduction
  • Different types of regression
  • Fitting a linear regression model with lm
  • Summarizing linear model fits
  • Using linear regression to predict unknown values
  • Generating a diagnostic plot of a fitted model
  • Fitting multiple regression
  • Summarizing multiple regression
  • Using multiple regression to predict unknown values
  • Fitting a polynomial regression model with lm
  • Fitting a robust linear regression model with rlm
  • Studying a case of linear regression on SLID data
  • Applying the Gaussian model for generalized linear regression
  • Applying the Poisson model for generalized linear regression
  • Applying the Binomial model for generalized linear regression
  • Fitting a generalized additive model to data
  • Visualizing a generalized additive model
  • Diagnosing a generalized additive model

  • Introduction
  • Loading and observing data
  • Viewing the summary of survival analysis
  • Visualizing the Survival Curve
  • Using the log-rank test
  • Using the COX proportional hazard model
  • Nelson-Aalen Estimator of cumulative hazard

  • Introduction
  • Preparing the training and testing datasets
  • Building a classification model with recursive partitioning trees
  • Visualizing a recursive partitioning tree
  • Measuring the prediction performance of a recursive partitioning tree
  • Pruning a recursive partitioning tree
  • Handling missing data and split and surrogate variables
  • Building a classification model with a conditional inference tree
  • Control parameters in conditional inference trees
  • Visualizing a conditional inference tree
  • Measuring the prediction performance of a conditional inference tree
  • Classifying data with the k-nearest neighbor classifier
  • Classifying data with logistic regression
  • Classifying data with the Nave Bayes classifier

  • Introduction
  • Choosing the cost of a support vector machine
  • Visualizing an SVM fit
  • Predicting labels based on a model trained by a support vector machine
  • Tuning a support vector machine
  • The basics of neural network
  • Training a neural network with neuralnet
  • Visualizing a neural network trained by neuralnet
  • Predicting labels based on a model trained by neuralnet
  • Training a neural network with nnet
  • Predicting labels based on a model trained by nnet

  • Introduction
  • Estimating model performance with k-fold cross-validation
  • Estimating model performance with Leave One Out Cross Validation
  • Performing cross-validation with the e1071 package
  • Performing cross-validation with the caret package
  • Ranking the variable importance with the caret package
  • Ranking the variable importance with the rminer package
  • Finding highly correlated features with the caret package
  • Selecting features using the caret package
  • Measuring the performance of the regression model
  • Measuring prediction performance with a confusion matrix
  • Measuring prediction performance using ROCR
  • Comparing an ROC curve using the caret package
  • Measuring performance differences between models with the caret package

  • Introduction
  • Using the Super Learner algorithm
  • Using ensemble to train and test
  • Classifying data with the bagging method
  • Performing cross-validation with the bagging method
  • Classifying data with the boosting method
  • Performing cross-validation with the boosting method
  • Classifying data with gradient boosting
  • Calculating the margins of a classifier
  • Calculating the error evolution of the ensemble method
  • Classifying data with random forest
  • Estimating the prediction errors of different classifiers

  • Introduction
  • Clustering data with hierarchical clustering
  • Cutting trees into clusters
  • Clustering data with the k-means method
  • Drawing a bivariate cluster plot
  • Comparing clustering methods
  • Extracting silhouette information from clustering
  • Obtaining the optimum number of clusters for k-means
  • Clustering data with the density-based method
  • Clustering data with the model-based method
  • Visualizing a dissimilarity matrix
  • Validating clusters externally

  • Introduction
  • Transforming data into transactions
  • Displaying transactions and associations
  • Mining associations with the Apriori rule
  • Pruning redundant rules
  • Visualizing association rules
  • Mining frequent itemsets with Eclat
  • Creating transactions with temporal information
  • Mining frequent sequential patterns with cSPADE
  • Using the TraMineR package for sequence analysis
  • Visualizing sequence, Chronogram, and Traversal Statistics

  • Introduction
  • Why to reduce the dimension?
  • Performing feature selection with FSelector
  • Performing dimension reduction with PCA
  • Determining the number of principal components using the scree test
  • Determining the number of principal components using the Kaiser method
  • Visualizing multivariate data using biplot
  • Performing dimension reduction with MDS
  • Reducing dimensions with SVD
  • Compressing images with SVD
  • Performing nonlinear dimension reduction with ISOMAP
  • Performing nonlinear dimension reduction with Local Linear
  • Embedding

  • Introduction
  • Preparing the RHadoop environment
  • Installing rmr2
  • Installing rhdfs
  • Operating HDFS with rhdfs
  • Implementing a word count problem with RHadoop
  • Comparing the performance between an R MapReduce program and a standard R program
  • Testing and debugging the rmr2 program
  • Installing plyrmr
  • Manipulating data with plyrmr
  • Conducting machine learning with RHadoop