Let us help you find the training program you are looking for.

If you can't find what you are looking for, contact us, we'll help you find it. We have over 800 training programs to choose from.

Practical Data Wrangling

  • Course Code: Data Analysis / BI - Practical Data Wrangling
  • Course Dates: Contact us to schedule.
  • Course Category: Big Data & Data Science Duration: 2 Days Audience: This course is geared for Python experienced developers, analysts or others who wants to turn your noisy data into relevant, insight-ready information by leveraging the data wrangling techniques in Python and R

Course Snapshot 

  • Duration: 2 days 
  • Skill-level: Foundation-level Data Wrangling skills for Intermediate skilled team members. This is not a basic class. 
  • Targeted Audience: This course is geared for Python experienced developers, analysts or others who wants to turn your noisy data into relevant, insight-ready information by leveraging the data wrangling techniques in Python and R 
  • Hands-on Learning: This course is approximately 50% hands-on lab to 50% lecture ratio, combining engaging lecture, demos, group activities and discussions with machine-based student labs and exercises. Student machines are required. 
  • Delivery Format: This course is available for onsite private classroom presentation, or remote instructor led delivery, or CBT/WBT (by request). 
  • Customizable: This course may be tailored to target your specific training skills objectives, tools of choice and learning goals. 

Around 80% of time in data analysis is spent on cleaning and preparing data for analysis. This is, however, an important task, and is a prerequisite to the rest of the data analysis workflow, including visualization, analysis and reporting. Python and R are considered a popular choice of tool for data analysis, and have packages that can be best used to manipulate different kinds of data, as per your requirements. This course will show you the different data wrangling techniques, and how you can leverage the power of Python and R packages to implement them. You’ll start by understanding the data wrangling process and get a solid foundation to work with different types of data. You’ll work with different data structures and acquire and parse data from various locations. You’ll also see how to reshape the layout of data and manipulate, summarize, and join data sets. Finally, we conclude with a quick primer on accessing and processing data from databases, conducting data exploration, and storing and retrieving data quickly using databases. The course includes practical examples on each of these points using simple and real-world data sets to give you an easier understanding. By the end of the course, you’ll have a thorough understanding of all the data wrangling concepts and how to implement them in the best possible way. 

Working in a hands-on learning environment, led by our Data Wrangling expert instructor, students will learn about and explore: 

  • This easy-to-follow guide takes you through every step of the data wrangling process in the best possible way 
  • Work with different types of datasets, and reshape the layout of your data to make it easier for analysis 
  • Get simple examples and real-life data wrangling solutions for data pre-processing 

Topics Covered: This is a high-level list of topics covered in this course. Please see the detailed Agenda below 

  • Read a csv file into python and R, and print out some statistics on the data 
  • Gain knowledge of the data formats and programming structures involved in retrieving API data 
  • Make effective use of regular expressions in the data wrangling process 
  • Explore the tools and packages available to prepare numerical data for analysis 
  • Find out how to have better control over manipulating the structure of the data 
  • Create a dexterity to programmatically read, audit, correct, and shape data 
  • Write and complete programs to take in, format, and output data sets 

Audience & Pre-Requisites 

This course is geared for attendees with Python skills who wish to turn your noisy data into relevant, insight-ready information by leveraging the data wrangling techniques in Python and R 

Pre-Requisites:  Students should have  

  • developers with some knowledge of Python.  
  • experienced with spreadsheet software who know the basics of Python. 

Course Agenda / Topics 

  1. Programming with Data 
  • Programming with Data 
  • Understanding data wrangling 
  • The tools for data wrangling 
  1. Introduction to Programming in Python 
  • Introduction to Programming in Python 
  • External resources 
  • Logistical overview 
  • Running programs in python 
  • Data types, variables, and the Python shell 
  • Compound statements 
  • Making annotations within programs 
  • A programmer’s resources 
  1. Reading, Exploring, and Modifying Data – Part I 
  • Reading, Exploring, and Modifying Data – Part I 
  • External resources 
  • Logistical overview 
  • Introducing a basic data wrangling work flow 
  • Introducing the JSON file format 
  • Opening and closing a file in Python using file I/O 
  • Reading the contents of a file 
  • Exploring the contents of a data file 
  • Modifying a dataset 
  • Outputting the modified data to a new file 
  • Specifying input and output file names in the Terminal 
  1. Reading, Exploring, and Modifying Data – Part II 
  • Reading, Exploring, and Modifying Data – Part II 
  • Logistical overview 
  • Understanding the CSV format 
  • Introducing the CSV module 
  • Using the CSV module to read CSV data 
  • Using the CSV module to write CSV data 
  • Using the pandas module to read and process data 
  • Handling non-standard CSV encoding and dialect 
  • Understanding XML 
  • Using the XML module to parse XML data 
  1. Manipulating Text Data – An Introduction to Regular Expressions 
  • Manipulating Text Data – An Introduction to Regular Expressions 
  • Logistical overview 
  • Understanding the need for pattern recognition 
  • Introducting regular expressions 
  • Looking for patterns 
  • Quantifying the existence of patterns 
  • Extracting patterns 
  • Summary 
  1. 6Cleaning Numerical Data – An Introduction to R and RStudio 
  • Cleaning Numerical Data – An Introduction to R and RStudio 
  • Logistical overview 
  • Introducing R and RStudio 
  • Familiarizing yourself with RStudio 
  • Conducting basic outlier detection and removal 
  • Handling NA values 
  • Variable names and contents 
  1. Simplifying Data Manipulation with dplyr 
  • Simplifying Data Manipulation with dplyr 
  • Logistical overview 
  • Introducing dplyr 
  • Getting started with dplyr 
  • Chaining operations together 
  • Filtering the rows of a dataframe 
  • Summarizing data by category 
  • Rewriting code using dplyr 
  1. Getting Data from the Web 
  • Getting Data from the Web 
  • Logistical overview 
  • Introducing APIs 
  • Using Python to retrieve data from APIs 
  • Using URL parameters to filter the results 
  1. Working with Large Datasets 
  • Working with Large Datasets 
  • Logistical overview  
  • Understanding computer memory 
  • Understanding databases 
  • Introducing MongoDB 
  • Interfacing with MongoDB from Python 
View All Courses

    Course Inquiry

    Fill in the details below and we will get back to you as quickly as we can.

    Interested in any of these related courses?