Python Feature Engineering Cookbook

Home » Technology » Big Data & Data Science » Python Feature Engineering Cookbook

Course Skill Level:

Foundational

Course Duration:

3 day/s

Course Delivery Format:

Live, instructor-led.
Course Category:

Big Data & Data Science
Course Code:

PYTFECL21E09

Who should attend & recommended skills:

Those with basic Python & advanced spreadsheet software experience

Who should attend & recommended skills

This course is geared for Python experienced developers, analysts or others with Python skills who wish to extract accurate information from data to train and improve machine learning models using NumPy, SciPy, pandas, and scikit-learn libraries.
Skill-level: Foundation-level Python Feature Engineering Cook skills for Intermediate skilled team members. This is not a basic class.
Python skills: Basic (1-2 years’ experience)
Spreadsheet software: Advanced (6+ years’ experience)

About this course

Feature engineering is invaluable for developing and enriching your machine learning models. In this cookbook, you will work with the best tools to streamline your feature engineering pipelines and techniques and simplify and improve the quality of your code.
Using Python libraries such as pandas, scikit-learn, Feature tools, and Feature-engine, you’ll learn how to work with both continuous and discrete datasets and be able to transform features from unstructured datasets. You will develop the skills necessary to select the best features as well as the most suitable extraction techniques. This course will cover Python recipes that will help you automate feature engineering to simplify complex processes. You’ll also get to grips with different feature engineering strategies, such as the box-cox transform, power transform, and log transform across machine learning, reinforcement learning, and natural language processing (NLP) domains. By the end of this course, you’ll have discovered tips and practical solutions to all of your feature engineering problems.

Skills acquired & topics covered

Solutions for feature generation, feature extraction, and feature selection
The end-to-end feature engineering process across continuous, discrete, and unstructured datasets
Implementing modern feature extraction techniques using Python’s pandas, scikit-learn, SciPy and NumPy libraries
Simplifying your feature engineering pipelines with powerful Python packages
Getting to grips with imputing missing values
Encoding categorical variables with a wide set of techniques
Extracting insights from text quickly and effortlessly
Developing features from transactional data and time series data
Deriving new features by combining existing variables
How to transform, discretize, and scale your variables
Creating informative variables from date and time

Course breakdown / modules

Technical requirements
Identifying numerical and categorical variables
Quantifying missing data
Determining cardinality in categorical variables
Pinpointing rare categories in categorical variables
Identifying a linear relationship
Identifying a normal distribution
Distinguishing variable distribution
Highlighting outliers
Comparing feature magnitude

Technical requirements
Removing observations with missing data
Performing mean or median imputation
Implementing mode or frequent category imputation
Replacing missing values with an arbitrary number
Capturing missing values in a bespoke category
Replacing missing values with a value at the end of the distribution
Implementing random sample imputation
Adding a missing value indicator variable
Performing multivariate imputation by chained equations
Assembling an imputation pipeline with scikit-learn
Assembling an imputation pipeline with Feature-engine

Technical requirements
Creating binary variables through one-hot encoding
Performing one-hot encoding of frequent categories
Replacing categories with ordinal numbers
Replacing categories with counts or frequency of observations
Encoding with integers in an ordered manner
Encoding with the mean of the target
Encoding with the Weight of Evidence
Grouping rare or infrequent categories
Performing binary encoding
Performing feature hashing

Technical requirements
Transforming variables with the logarithm
Transforming variables with the reciprocal function
Using square and cube root to transform variables
Using power transformations on numerical variables
Performing Box-Cox transformation on numerical variables
Performing Yeo-Johnson transformation on numerical variables

Technical requirements
Dividing the variable into intervals of equal width
Sorting the variable values in intervals of equal frequency
Performing discretization followed by categorical encoding
Allocating the variable values in arbitrary intervals
Performing discretization with k-means clustering
Using decision trees for discretization

Technical requirements
Trimming outliers from the dataset
Performing winsorization
Capping the variable at arbitrary maximum and minimum values
Performing zero-coding – capping the variable at zero

Technical requirements
Extracting date and time parts from a datetime variable
Deriving representations of the year and month
Creating representations of day and week
Extracting time parts from a time variable
Capturing the elapsed time between datetime variables
Working with time in different time zones

Technical requirements
Standardizing the features
Performing mean normalization
Scaling to the maximum and minimum values
Implementing maximum absolute scaling
Scaling with the median and quantiles
Scaling to vector unit length

Technical requirements
Combining multiple features with statistical operations
Combining pairs of features with mathematical functions
Performing polynomial expansion
Deriving new features with decision trees
Carrying out PCA

Technical requirements
Aggregating transactions with mathematical operations
Aggregating transactions in a time window
Determining the number of local maxima and minima
Deriving time elapsed between time-stamped events
Creating features from transactions with Feature tools

Extracting Features from Text Variables
Technical requirements
Counting characters, words, and vocabulary
Estimating text complexity by counting sentences
Creating features with bag-of-words and n-grams
Implementing term frequency-inverse document frequency
Cleaning and stemming text variables

Free Training Courses

Leadership & Professional Development Courses

Microsoft Office Courses

Technology Courses

Python Feature Engineering Cookbook

Who should attend & recommended skills

About this course

Skills acquired & topics covered

Course breakdown / modules

Browse our programs to take the next step toward advancing yourself, your team, and organization.

Free Training Courses

Leadership & Professional Development Courses

Microsoft Office Courses

Technology Courses

Let us help you find the training program you are looking for.

Python Feature Engineering Cookbook

Who should attend & recommended skills

About this course

Skills acquired & topics covered

Course breakdown / modules

Foreseeing Variable Problems When Building ML Models

Imputing Missing Data

Encoding Categorical Variables

Transforming Numerical Variables

Performing Variable Discretization

Working with Outliers

Deriving Features from Dates and Time Variables

Performing Feature Scaling

Applying Mathematical Computations to Features

Creating Features with Transactional and Time Series Data

Extracting Features from Text Variables

Browse our programs to take the next step toward advancing yourself, your team, and organization.

View Course Detail