Let us help you find the training program you are looking for.

If you can't find what you are looking for, contact us, we'll help you find it. We have over 800 training programs to choose from.

Python Feature Engineering Cookbook

  • Course Code: Data Analysis / BI - Python Feature Engineering Cookbook
  • Course Dates: Contact us to schedule.
  • Course Category: Big Data & Data Science Duration: 3 Days Audience: This course is geared for Python experienced developers, analysts or others who wants to Extract accurate information from data to train and improve machine learning models using NumPy, SciPy, pandas, and scikit-learn libraries.

Course Snapshot 

  • Duration: 3 days 
  • Skill-level: Foundation-level Python Feature Engineering Cook  skills for Intermediate skilled team members. This is not a basic class. 
  • Targeted Audience: This course is geared for Python experienced developers, analysts or others who wants to Extract accurate information from data to train and improve machine learning models using NumPy, SciPy, pandas, and scikit-learn libraries.  
  • Hands-on Learning: This course is approximately 50% hands-on lab to 50% lecture ratio, combining engaging lecture, demos, group activities and discussions with machine-based student labs and exercises. Student machines are required. 
  • Delivery Format: This course is available for onsite private classroom presentation, or remote instructor led delivery, or CBT/WBT (by request). 
  • Customizable: This course may be tailored to target your specific training skills objectives, tools of choice and learning goals. 

Feature engineering is invaluable for developing and enriching your machine learning models. In this cookbook, you will work with the best tools to streamline your feature engineering pipelines and techniques and simplify and improve the quality of your code. 

Using Python libraries such as pandas, scikit-learn, Feature tools, and Feature-engine, you’ll learn how to work with both continuous and discrete datasets and be able to transform features from unstructured datasets. You will develop the skills necessary to select the best features as well as the most suitable extraction techniques. This course will cover Python recipes that will help you automate feature engineering to simplify complex processes. You’ll also get to grips with different feature engineering strategies, such as the box-cox transform, power transform, and log transform across machine learning, reinforcement learning, and natural language processing (NLP) domains. By the end of this course, you’ll have discovered tips and practical solutions to all of your feature engineering problems. 

Working in a hands-on learning environment, led by our Python expert instructor, students will learn about and explore: 

  • Discover solutions for feature generation, feature extraction, and feature selection 
  • Uncover the end-to-end feature engineering process across continuous, discrete, and unstructured datasets 
  • Implement modern feature extraction techniques using Python’s pandas, scikit-learn, SciPy and NumPy libraries 

Topics Covered: This is a high-level list of topics covered in this course. Please see the detailed Agenda below 

  • Simplify your feature engineering pipelines with powerful Python packages 
  • Get to grips with imputing missing values 
  • Encode categorical variables with a wide set of techniques 
  • Extract insights from text quickly and effortlessly 
  • Develop features from transactional data and time series data 
  • Derive new features by combining existing variables 
  • Understand how to transform, discretize, and scale your variables 
  • Create informative variables from date and time 

Audience & Pre-Requisites 

This course is geared for attendees with Python skills who wish to Extract accurate information from data to train and improve machine learning models using NumPy, SciPy, pandas, and scikit-learn libraries 

Pre-Requisites:  Students should have  

  • developers with some knowledge of Python.  
  • experienced with spreadsheet software who know the basics of Python. 

Course Agenda / Topics 

  1. Foreseeing Variable Problems When Building ML Models 
  • Foreseeing Variable Problems When Building ML Models 
  • Technical requirements 
  • Identifying numerical and categorical variables 
  • Quantifying missing data 
  • Determining cardinality in categorical variables 
  • Pinpointing rare categories in categorical variables 
  • Identifying a linear relationship 
  • Identifying a normal distribution 
  • Distinguishing variable distribution 
  • Highlighting outliers 
  • Comparing feature magnitude 
  1. Imputing Missing Data 
  • Imputing Missing Data 
  • Technical requirements 
  • Removing observations with missing data 
  • Performing mean or median imputation 
  • Implementing mode or frequent category imputation 
  • Replacing missing values with an arbitrary number 
  • Capturing missing values in a bespoke category 
  • Replacing missing values with a value at the end of the distribution 
  • Implementing random sample imputation 
  • Adding a missing value indicator variable 
  • Performing multivariate imputation by chained equations 
  • Assembling an imputation pipeline with scikit-learn 
  • Assembling an imputation pipeline with Feature-engine 
  1. Encoding Categorical Variables 
  • Encoding Categorical Variables 
  • Technical requirements 
  • Creating binary variables through one-hot encoding 
  • Performing one-hot encoding of frequent categories 
  • Replacing categories with ordinal numbers 
  • Replacing categories with counts or frequency of observations 
  • Encoding with integers in an ordered manner 
  • Encoding with the mean of the target 
  • Encoding with the Weight of Evidence 
  • Grouping rare or infrequent categories 
  • Performing binary encoding 
  • Performing feature hashing 
  1. Transforming Numerical Variables 
  • Transforming Numerical Variables 
  • Technical requirements 
  • Transforming variables with the logarithm 
  • Transforming variables with the reciprocal function 
  • Using square and cube root to transform variables 
  • Using power transformations on numerical variables 
  • Performing Box-Cox transformation on numerical variables 
  • Performing Yeo-Johnson transformation on numerical variables 
  1. Performing Variable Discretization 
  • Performing Variable Discretization 
  • Technical requirements 
  • Dividing the variable into intervals of equal width 
  • Sorting the variable values in intervals of equal frequency 
  • Performing discretization followed by categorical encoding 
  • Allocating the variable values in arbitrary intervals 
  • Performing discretization with k-means clustering 
  • Using decision trees for discretization 
  1. Working with Outliers 
  • Working with Outliers 
  • Technical requirements 
  • Trimming outliers from the dataset 
  • Performing winsorization 
  • Capping the variable at arbitrary maximum and minimum values 
  • Performing zero-coding – capping the variable at zero 
  1. Deriving Features from Dates and Time Variables 
  • Deriving Features from Dates and Time Variables 
  • Technical requirements 
  • Extracting date and time parts from a datetime variable 
  • Deriving representations of the year and month 
  • Creating representations of day and week 
  • Extracting time parts from a time variable 
  • Capturing the elapsed time between datetime variables 
  • Working with time in different time zones 
  1. Performing Feature Scaling 
  • Performing Feature Scaling 
  • Technical requirements 
  • Standardizing the features 
  • Performing mean normalization 
  • Scaling to the maximum and minimum values 
  • Implementing maximum absolute scaling 
  • Scaling with the median and quantiles 
  • Scaling to vector unit length 
  1. Applying Mathematical Computations to Features 
  • Applying Mathematical Computations to Features 
  • Technical requirements 
  • Combining multiple features with statistical operations 
  • Combining pairs of features with mathematical functions 
  • Performing polynomial expansion 
  • Deriving new features with decision trees 
  • Carrying out PCA 
  1. Creating Features with Transactional and Time Series Data 
  • Creating Features with Transactional and Time Series Data 
  • Technical requirements 
  • Aggregating transactions with mathematical operations 
  • Aggregating transactions in a time window 
  • Determining the number of local maxima and minima 
  • Deriving time elapsed between time-stamped events 
  • Creating features from transactions with Feature tools 
  1. Extracting Features from Text Variables 
  • Extracting Features from Text Variables 
  • Technical requirements 
  • Counting characters, words, and vocabulary 
  • Estimating text complexity by counting sentences 
  • Creating features with bag-of-words and n-grams 
  • Implementing term frequency-inverse document frequency 
  • Cleaning and stemming text variables 
View All Courses

    Course Inquiry

    Fill in the details below and we will get back to you as quickly as we can.

    Interested in any of these related courses?