Learn practical skills, build real-world projects, and advance your career


Feature engineering is the process of creating new features from raw data to increase the predictive power of the learning algorithm. Engineered features should capture additional information that is not easily apparent in the original feature set.

References :

In this notebook, I will try to cover most of the common techniques for feature engineering.

We will learn about :

Before diving into the feature engineering, let's see the lifecycle of a datascience project and have insight about where exactly the feature engineering is performed.

Life cycle of a data science project

Life cycle of a data science project is comprised of various phases each of them has their own importance for solving the problem related to the interested domain. The phases are as follows :

  1. Defining the problem statement

  2. Data collection strategy :

    This phases deals with the collection of data using various method, tools, techniques and sources that includes web APIs, web scraping, company's database, surveys etc.

  3. Data preprocessing

    • Feature engineering :

      Handling missing values, feature normalization, feature scaling, new feature generation, handling unbalanced data etc

    • Feature selection :

      Select only those features that are highly correlated with the target feature. Dropping highly correlated independent features.

  4. Exploratory data analysis

    Here we will understand the data and relations among various features. We try to inference as much as information that lies within our topic of interest.

  5. Modeling

    We will prepare out machine learning model.

  6. Model evaluation

    Mode is evaluated based on its accuracy, effictiveness and fitness. The evaluation report allows us to decide whether the model is ready for the deployment or model needs some optimization, audjustment or different model to be build.

  7. Model deployment

Model is made available to solve the related problems.

data science life cycle.jpg

Feature Engineering in a Machine learning workflow