Let us help you find the training program you are looking for.

If you can't find what you are looking for, contact us, we'll help you find it. We have over 800 training programs to choose from.

Data Analysis with R

  • Course Code: Data Analysis / BI - Data Analysis with R
  • Course Dates: Contact us to schedule.
  • Course Category: Big Data & Data Science Duration: 4 Days Audience: This course is geared for Python experienced developers, analysts or others who wants to Learn, by example, the fundamentals of data analysis as well as several intermediate to advanced methods and techniques ranging from classification and regression to Bayesian methods and MCMC, which can be put to immediate use.

Course Snapshot 

  • Duration: 4 days 
  • Skill-level: Foundation-level Data Analysis with R skills for Intermediate skilled team members. This is not a basic class. 
  • Targeted Audience: This course is geared for Python experienced developers, analysts or others who wants to Learn, by example, the fundamentals of data analysis as well as several intermediate to advanced methods and techniques ranging from classification and regression to Bayesian methods and MCMC, which can be put to immediate use.  
  • Hands-on Learning: This course is approximately 50% hands-on lab to 50% lecture ratio, combining engaging lecture, demos, group activities and discussions with machine-based student labs and exercises. Student machines are required. 
  • Delivery Format: This course is available for onsite private classroom presentation, or remote instructor led delivery, or CBT/WBT (by request). 
  • Customizable: This course may be tailored to target your specific training skills objectives, tools of choice and learning goals. 

Frequently the tool of choice for academics, R has spread deep into the private sector and can be found in the production pipelines at some of the most advanced and successful enterprises. The power and domain-specificity of R allows the user to express complex analytics easily, quickly, and succinctly. Starting with the basics of R and statistical reasoning, this course dives into advanced predictive analytics, showing how to apply those techniques to real-world data though with real-world examples. Packed with engaging problems and exercises, this course begins with a review of R and its syntax with packages like Rcpp, ggplot2, and dplyr. From there, get to grips with the fundamentals of applied statistics and build on this knowledge to perform sophisticated and powerful analytics. Solve the difficulties relating to performing data analysis in practice and find solutions to working with messy data, large data, communicating results, and facilitating reproducibility. This course is engineered to be an invaluable resource through many stages of anyone’s career as a data analyst. 

Working in a hands-on learning environment, led by our Data Analysis with R expert instructor, students will learn about and explore: 

  • Analyze your data using R – the most powerful statistical programming language 
  • Learn how to implement applied statistics using practical use-cases 
  • Use popular R packages to work with unstructured and structured data 

Topics Covered: This is a high-level list of topics covered in this course. Please see the detailed Agenda below 

  • Gain a thorough understanding of statistical reasoning and sampling theory 
  • Employ hypothesis testing to draw inferences from your data 
  • Learn Bayesian methods for estimating parameters 
  • Train regression, classification, and time series models 
  • Handle missing data gracefully using multiple imputation 
  • Identify and manage problematic data points 
  • Learn how to scale your analyses to larger data with Rcpp, data.table, dplyr, and parallelization 
  • Put best practices into effect to make your job easier and facilitate reproducibility 

Audience & Pre-Requisites 

This course is geared for attendees with Python skills who wish to Learn, by example, the fundamentals of data analysis as well as several intermediate to advanced methods and techniques ranging from classification and regression to Bayesian methods and MCMC, which can be put to immediate use. 

Pre-Requisites:  Students should have  

  • developers with some knowledge of Python.  

Course Agenda / Topics 

  1. RefresheR 
  • RefresheR 
  • Navigating the basics 
  • Getting help in R 
  • Vectors 
  • Functions 
  • Matrices 
  • Loading data into R 
  • Working with packages 
  1. The Shape of Data 
  • The Shape of Data 
  • Univariate data 
  • Frequency distributions 
  • Central tendency 
  • Spread 
  • Populations, samples, and estimation 
  • Probability distributions 
  • Visualization methods 
  1. Describing Relationships 
  • Describing Relationships 
  • Multivariate data 
  • Relationships between a categorical and continuous variable 
  • Relationships between two categorical variables 
  • The relationship between two continuous variables 
  • Visualization methods 
  1. Probability 
  • Probability 
  • Basic probability 
  • A tale of two interpretations 
  • Sampling from distributions 
  • The normal distribution 
  1. Using Data To Reason About The World 
  • Using Data To Reason About The World 
  • Estimating means 
  • The sampling distribution 
  • Interval estimation 
  • Smaller samples 
  1. Testing Hypotheses 
  • Testing Hypotheses 
  • The null hypothesis significance testing framework 
  • Testing the mean of one sample 
  • Testing two means 
  • Testing more than two means 
  • Testing independence of proportions 
  • What if my assumptions are unfounded? 
  1. Bayesian Methods 
  • Bayesian Methods 
  • The big idea behind Bayesian analysis 
  • Choosing a prior 
  • Who cares about coin flips 
  • Enter MCMC – stage left 
  • Using JAGS and runjags 
  • Fitting distributions the Bayesian way 
  • The Bayesian independent samples t-test 
  1. The Bootstrap 
  • The Bootstrap 
  • What’s… uhhh… the deal with the bootstrap? 
  • Performing the bootstrap in R (more elegantly) 
  • Confidence intervals 
  • A one-sample test of means 
  • Bootstrapping statistics other than the mean 
  • Busting bootstrap myths 
  1. Predicting Continuous Variables 
  • Predicting Continuous Variables 
  • Linear models 
  • Simple linear regression 
  • Simple linear regression with a binary predictor 
  • Multiple regression 
  • Regression with a non-binary predictor 
  • Kitchen sink regression 
  • The bias-variance trade-off 
  • Linear regression diagnostics 
  • Advanced topics 
  1. Predicting Categorical Variables 
  • Predicting Categorical Variables 
  • k-Nearest neighbors 
  • Logistic regression 
  • Decision trees 
  • Random forests 
  • Choosing a classifier 
  1. Predicting Changes with Time 
  • Predicting Changes with Time 
  • What is a time series? 
  • What is forecasting? 
  • Creating and plotting time series 
  • Components of time series 
  • Time series decomposition 
  • White noise 
  • Autocorrelation 
  • Smoothing 
  • ETS and the state space model 
  • Interventions for improvement 
  • What we didn’t cover 
  • Citations for the climate change data 
  1. Sources of Data 
  • Sources of Data 
  • Relational databases 
  • Using JSON 
  • XML 
  • Other data formats 
  • Online repositories 
  1. Dealing with Missing Data 
  • Dealing with Missing Data 
  • Analysis with missing data 
  • Visualizing missing data 
  • Types of missing data 
  • Unsophisticated methods for dealing with missing data 
  • So how does mice come up with the imputed values? 
  1. Dealing with Messy Data 
  • Dealing with Messy Data 
  • Checking unsanitized data 
  • Regular expressions 
  • Other tools for messy data 
  1. Dealing with Large Data 
  • Dealing with Large Data 
  • Wait to optimize 
  • Using a bigger and faster machine 
  • Be smart about your code 
  • Using optimized packages 
  • Using another R implementation 
  • Using parallelization 
  • Using Rcpp 
  • Being smarter about your code 
  1. Working with Popular R Packages 
  • Working with Popular R Packages 
  • The data.table package 
  • Using dplyr and tidyr to manipulate data 
  • Functional programming as a main tidyverse principle 
  • Reshaping data with tidyr 
  1. Reproducibility and Best Practices 
  • Reproducibility and Best Practices 
  • R scripting 
  • R projects 
  • Version control 
  • Communicating results
View All Courses

    Course Inquiry

    Fill in the details below and we will get back to you as quickly as we can.

    Interested in any of these related courses?