Let us help you find the training program you are looking for.

If you can't find what you are looking for, contact us, we'll help you find it. We have over 800 training programs to choose from.

banner-img

Course Skill Level:

Foundational

Course Duration:

4 day/s

  • Course Delivery Format:

    Live, instructor-led.

  • Course Category:

    Big Data & Data Science

  • Course Code:

    DAR000L21E09

Who should attend & recommended skills:

Developers, analysts or others with basic Python and developing experience

Who should attend & recommended skills

  • This course is geared for Python experienced developers, analysts or others who want to learn, by example, the fundamentals of data analysis as well as several intermediate to advanced methods and techniques ranging from classification and regression to Bayesian methods and MCMC, which can be put to immediate use.
  • Skill-level: Foundation-level Data Analysis with R skills for Intermediate skilled team members. This is not a basic class.
  • Developers: Basic (1-2 years’ experience)
  • Python: Basic (1-2 years’ experience)

About this course

Frequently the tool of choice for academics, R has spread deep into the private sector and can be found in the production pipelines at some of the most advanced and successful enterprises. The power and domain-specificity of R allows the user to express complex analytics easily, quickly, and succinctly. Starting with the basics of R and statistical reasoning, this course dives into advanced predictive analytics, showing how to apply those techniques to real-world data though with real-world examples. Packed with engaging problems and exercises, this course begins with a review of R and its syntax with packages like Rcpp, ggplot2, and dplyr. From there, get to grips with the fundamentals of applied statistics and build on this knowledge to perform sophisticated and powerful analytics. Solve the difficulties relating to performing data analysis in practice and find solutions to working with messy data, large data, communicating results, and facilitating reproducibility. This course is engineered to be an invaluable resource through many stages of anyone’s career as a data analyst.

Skills acquired & topics covered

  • Working in a hands-on learning environment, led by our Data Analysis with R expert instructor, students will learn about and explore:
  • Analyzing your data using R the most powerful statistical programming language
  • How to implement applied statistics using practical use-cases
  • Using popular R packages to work with unstructured and structured data
  • Gaining a thorough understanding of statistical reasoning and sampling theory
  • Employing hypothesis testing to draw inferences from your data
  • Bayesian methods for estimating parameters
  • Train regression, classification, and time series models
  • Handling missing data gracefully using multiple imputation
  • Identifying and manage problematic data points
  • How to scale your analyses to larger data with Rcpp, data.table, dplyr, and parallelization
  • Putting best practices into effect to make your job easier and facilitate reproducibility

Course breakdown / modules

  • Navigating the basics
  • Getting help in R
  • Vectors
  • Functions
  • Matrices
  • Loading data into R
  • Working with packages

  • Univariate data
  • Frequency distributions
  • Central tendency
  • Spread
  • Populations, samples, and estimation
  • Probability distributions
  • Visualization methods

  • Multivariate data
  • Relationships between a categorical and continuous variable
  • Relationships between two categorical variables
  • The relationship between two continuous variables
  • Visualization methods

  • Basic probability
  • A tale of two interpretations
  • Sampling from distributions
  • The normal distribution

  • Estimating means
  • The sampling distribution
  • Interval estimation
  • Smaller samples

  • The null hypothesis significance testing framework
  • Testing the mean of one sample
  • Testing two means
  • Testing more than two means
  • Testing independence of proportions
  • What if my assumptions are unfounded?

  • The big idea behind Bayesian analysis
  • Choosing a prior
  • Who cares about coin flips
  • Enter MCMC – stage left
  • Using JAGS and runjags
  • Fitting distributions the Bayesian way
  • The Bayesian independent samples t-test

  • What’s… uhhh… the deal with the bootstrap?
  • Performing the bootstrap in R (more elegantly)
  • Confidence intervals
  • A one-sample test of means
  • Bootstrapping statistics other than the mean
  • Busting bootstrap myths

  • Linear models
  • Simple linear regression
  • Simple linear regression with a binary predictor
  • Multiple regression
  • Regression with a non-binary predictor
  • Kitchen sink regression
  • The bias-variance trade-off
  • Linear regression diagnostics
  • Advanced topics

  • k-Nearest neighbors
  • Logistic regression
  • Decision trees
  • Random forests
  • Choosing a classifier

  • What is a time series?
  • What is forecasting?
  • Creating and plotting time series
  • Components of time series
  • Time series decomposition
  • White noise
  • Autocorrelation
  • Smoothing
  • ETS and the state space model
  • Interventions for improvement
  • What we didn’t cover
  • Citations for the climate change data

  • Relational databases
  • Using JSON
  • XML
  • Other data formats
  • Online repositories

  • Analysis with missing data
  • Visualizing missing data
  • Types of missing data
  • Unsophisticated methods for dealing with missing data
  • So how does mice come up with the imputed values?

  • Checking unsanitized data
  • Regular expressions
  • Other tools for messy data

  • Wait to optimize
  • Using a bigger and faster machine
  • Be smart about your code
  • Using optimized packages
  • Using another R implementation
  • Using parallelization
  • Using Rcpp
  • Being smarter about your code

  • The data.table package
  • Using dplyr and tidyr to manipulate data
  • Functional programming as a main tidyverse principle
  • Reshaping data with tidyr

  • R scripting
  • R projects
  • Version control
  • Communicating results