Let us help you find the training program you are looking for.

If you can't find what you are looking for, contact us, we'll help you find it. We have over 800 training programs to choose from.

banner-img

Course Skill Level:

Foundational

Course Duration:

3 day/s

  • Course Delivery Format:

    Live, instructor-led.

  • Course Category:

    Big Data & Data Science

  • Course Code:

    PYTFECL21E09

Who should attend & recommended skills:

Those with basic Python & advanced spreadsheet software experience

Who should attend & recommended skills

  • This course is geared for Python experienced developers, analysts or others with Python skills who wish to extract accurate information from data to train and improve machine learning models using NumPy, SciPy, pandas, and scikit-learn libraries.
  • Skill-level: Foundation-level Python Feature Engineering Cook skills for Intermediate skilled team members. This is not a basic class.
  • Python skills: Basic (1-2 years’ experience)
  • Spreadsheet software: Advanced (6+ years’ experience)

About this course

Feature engineering is invaluable for developing and enriching your machine learning models. In this cookbook, you will work with the best tools to streamline your feature engineering pipelines and techniques and simplify and improve the quality of your code.
Using Python libraries such as pandas, scikit-learn, Feature tools, and Feature-engine, you’ll learn how to work with both continuous and discrete datasets and be able to transform features from unstructured datasets. You will develop the skills necessary to select the best features as well as the most suitable extraction techniques. This course will cover Python recipes that will help you automate feature engineering to simplify complex processes. You’ll also get to grips with different feature engineering strategies, such as the box-cox transform, power transform, and log transform across machine learning, reinforcement learning, and natural language processing (NLP) domains. By the end of this course, you’ll have discovered tips and practical solutions to all of your feature engineering problems.

Skills acquired & topics covered

  • Solutions for feature generation, feature extraction, and feature selection
  • The end-to-end feature engineering process across continuous, discrete, and unstructured datasets
  • Implementing modern feature extraction techniques using Python’s pandas, scikit-learn, SciPy and NumPy libraries
  • Simplifying your feature engineering pipelines with powerful Python packages
  • Getting to grips with imputing missing values
  • Encoding categorical variables with a wide set of techniques
  • Extracting insights from text quickly and effortlessly
  • Developing features from transactional data and time series data
  • Deriving new features by combining existing variables
  • How to transform, discretize, and scale your variables
  • Creating informative variables from date and time

Course breakdown / modules

  • Technical requirements
  • Identifying numerical and categorical variables
  • Quantifying missing data
  • Determining cardinality in categorical variables
  • Pinpointing rare categories in categorical variables
  • Identifying a linear relationship
  • Identifying a normal distribution
  • Distinguishing variable distribution
  • Highlighting outliers
  • Comparing feature magnitude

  • Technical requirements
  • Removing observations with missing data
  • Performing mean or median imputation
  • Implementing mode or frequent category imputation
  • Replacing missing values with an arbitrary number
  • Capturing missing values in a bespoke category
  • Replacing missing values with a value at the end of the distribution
  • Implementing random sample imputation
  • Adding a missing value indicator variable
  • Performing multivariate imputation by chained equations
  • Assembling an imputation pipeline with scikit-learn
  • Assembling an imputation pipeline with Feature-engine

  • Technical requirements
  • Creating binary variables through one-hot encoding
  • Performing one-hot encoding of frequent categories
  • Replacing categories with ordinal numbers
  • Replacing categories with counts or frequency of observations
  • Encoding with integers in an ordered manner
  • Encoding with the mean of the target
  • Encoding with the Weight of Evidence
  • Grouping rare or infrequent categories
  • Performing binary encoding
  • Performing feature hashing

  • Technical requirements
  • Transforming variables with the logarithm
  • Transforming variables with the reciprocal function
  • Using square and cube root to transform variables
  • Using power transformations on numerical variables
  • Performing Box-Cox transformation on numerical variables
  • Performing Yeo-Johnson transformation on numerical variables

  • Technical requirements
  • Dividing the variable into intervals of equal width
  • Sorting the variable values in intervals of equal frequency
  • Performing discretization followed by categorical encoding
  • Allocating the variable values in arbitrary intervals
  • Performing discretization with k-means clustering
  • Using decision trees for discretization

  • Technical requirements
  • Trimming outliers from the dataset
  • Performing winsorization
  • Capping the variable at arbitrary maximum and minimum values
  • Performing zero-coding – capping the variable at zero

  • Technical requirements
  • Extracting date and time parts from a datetime variable
  • Deriving representations of the year and month
  • Creating representations of day and week
  • Extracting time parts from a time variable
  • Capturing the elapsed time between datetime variables
  • Working with time in different time zones

  • Technical requirements
  • Standardizing the features
  • Performing mean normalization
  • Scaling to the maximum and minimum values
  • Implementing maximum absolute scaling
  • Scaling with the median and quantiles
  • Scaling to vector unit length

  • Technical requirements
  • Combining multiple features with statistical operations
  • Combining pairs of features with mathematical functions
  • Performing polynomial expansion
  • Deriving new features with decision trees
  • Carrying out PCA

  • Technical requirements
  • Aggregating transactions with mathematical operations
  • Aggregating transactions in a time window
  • Determining the number of local maxima and minima
  • Deriving time elapsed between time-stamped events
  • Creating features from transactions with Feature tools

  • Extracting Features from Text Variables
  • Technical requirements
  • Counting characters, words, and vocabulary
  • Estimating text complexity by counting sentences
  • Creating features with bag-of-words and n-grams
  • Implementing term frequency-inverse document frequency
  • Cleaning and stemming text variables