Let us help you find the training program you are looking for.

If you can't find what you are looking for, contact us, we'll help you find it. We have over 800 training programs to choose from.

banner-img

Course Skill Level:

Foundational

Course Duration:

2 day/s

  • Course Delivery Format:

    Live, instructor-led.

  • Course Category:

    Big Data & Data Science

  • Course Code:

    PDWRANL21E09

Who should attend & recommended skills:

Those experienced in Python with intermediate spreadsheet software skills

Who should attend & recommended skills

  • This course is geared for Python experienced developers, analysts or others with Python skills who wish to turn your noisy data into relevant, insight-ready information by leveraging the data wrangling techniques in Python and R.
  • Skill-level: Foundation-level Data Wrangling skills for Intermediate skilled team members. This is not a basic class.
  • Python developers: Basic (1-2 years’ experience)
  • Spreadsheet software: Intermediate (3-5 years’ experience)

About this course

Around 80% of time in data analysis is spent on cleaning and preparing data for analysis. This is, however, an important task, and is a prerequisite to the rest of the data analysis workflow, including visualization, analysis and reporting. Python and R are considered a popular choice of tool for data analysis, and have packages that can be best used to manipulate different kinds of data, as per your requirements. This course will show you the different data wrangling techniques, and how you can leverage the power of Python and R packages to implement them. You’ll start by understanding the data wrangling process and get a solid foundation to work with different types of data. You’ll work with different data structures and acquire and parse data from various locations. You’ll also see how to reshape the layout of data and manipulate, summarize, and join data sets. Finally, we conclude with a quick primer on accessing and processing data from databases, conducting data exploration, and storing and retrieving data quickly using databases. The course includes practical examples on each of these points using simple and real-world data sets to give you an easier understanding. By the end of the course, you’ll have a thorough understanding of all the data wrangling concepts and how to implement them in the best possible way.

Skills acquired & topics covered

  • This easy-to-follow guide which takes you through every step of the data wrangling process in the best possible way
  • Working with different types of datasets, and reshaping the layout of your data to make it easier for analysis
  • Getting simple examples and real-life data wrangling solutions for data pre-processing
  • Reading a csv file into python and R, and print out some statistics on the data
  • Gaining knowledge of the data formats and programming structures involved in retrieving API data
  • Making effective use of regular expressions in the data wrangling process
  • The tools and packages available to prepare numerical data for analysis
  • How to have better control over manipulating the structure of the data
  • Creating a dexterity to programmatically read, audit, correct, and shape data
  • Writing and completing programs to take in, format, and output data sets

Course breakdown / modules

  • Understanding data wrangling
  • The tools for data wrangling

  • External resources
  • Logistical overview
  • Running programs in python
  • Data types, variables, and the Python shell
  • Compound statements
  • Making annotations within programs
  • A programmer’s resources

  • External resources
  • Logistical overview
  • Introducing a basic data wrangling work flow
  • Introducing the JSON file format
  • Opening and closing a file in Python using file I/O
  • Reading the contents of a file
  • Exploring the contents of a data file
  • Modifying a dataset
  • Outputting the modified data to a new file
  • Specifying input and output file names in the Terminal

  • Logistical overview
  • Understanding the CSV format
  • Introducing the CSV module
  • Using the CSV module to read CSV data
  • Using the CSV module to write CSV data
  • Using the pandas module to read and process data
  • Handling non-standard CSV encoding and dialect
  • Understanding XML
  • Using the XML module to parse XML data

  • Logistical overview
  • Understanding the need for pattern recognition
  • Introducing regular expressions
  • Looking for patterns
  • Quantifying the existence of patterns
  • Extracting patterns
  • Summary

  • Cleaning Numerical Data – An Introduction to R and RStudio
  • Logistical overview
  • Introducing R and RStudio
  • Familiarizing yourself with RStudio
  • Conducting basic outlier detection and removal
  • Handling NA values
  • Variable names and contents

  • Logistical overview
  • Introducing dplyr
  • Getting started with dplyr
  • Chaining operations together
  • Filtering the rows of a dataframe
  • Summarizing data by category
  • Rewriting code using dplyr

  • Logistical overview
  • Introducing APIs
  • Using Python to retrieve data from APIs
  • Using URL parameters to filter the results

  • Logistical overview
  • Understanding computer memory
  • Understanding databases
  • Introducing MongoDB
  • Interfacing with MongoDB from Python