Feature Engineering

Feature Engieering

Feature engineering is the process of using domain knowledge of the data to create features that make machine learning algorithms work. Feature engineering is fundamental to the application of the machine learning, and is both difficult and expensive. The need for manual feature engineering can be obviated by automated feature learning.

Feature Engineering is an informal topic, but it is considered essential in applied machine learning.

Coming up with features is difficult, time-consuming, requires expert knowledge. "Applied machine learning" is basically feature engineering. -- Andrew NG

When working on a machine learning problem, feature engineering is manually designing what the input x's should be. -- Shayne Miel


A feature is a piece of information that might be useful for prediction. Any attribute could be a feature, as long as it is useful to the model.

The purpose of a feature, other than being an attribute, would be much easier to understand in the context of a problem. A feature is a characteristic that might help when solving the problem.

The process of feature engineering

  1. Brainstorming features
  • Deciding what features to create
  • Creating features
  • Checking how the features work with your model
  • Improving your features if needed
  • Go back to brainstorming/creating more features until the work is done

Feature relevance

Depending on a feature it could be strongly relevant(has information that doesn't exist in any other feature), relevant, weakly relevant(some information that other features include) or irrelevant. It is important to create a lot of features. Even if some of them are irrelevant, you can't afford missing the rest. Afterwards, feature selection can be used in order to prevent overfitting.

Feature explosion

Feature explosion can be caused by feature combination or feature templates, both leading to a quick growth in the total number of features.

  • Feature templates - implementing features templates instead of coding new features
  • Feature combinations - combinations that cannot be represented by the linear system

There are a few solutions to help stop features explosion such as regularization, kernel method, feature selection.

Feature Learning

In machine learning, feature learning or representation learning is a set of techniques that learn a feature: a transformation of raw data input to a representation that can be effectively exploited in machine learning tasks. This obviates manual feature engineering, which is otherwise necessary, and allows a machine to both learn at a specific task(using the features) and learn the features themselves: to learn how to learn.

Feature learning is motivated by the fact that machine learning tasks such as classification often requires input that is mathematically and computationally convenient to process. However, real-world data such as images, video, and sensor measurement is usually complex, redundant, and highly variable. Thus, it is necessary to discover useful features or representations from raw data. Traditional hand-crafted features often require expensive human labor and often rely on expert knowledge. Also, they normally do not generalize well. This motivates the design of efficient feature learning techniques, to automate and generalize this.

Feature learning can be divided into two categories: supervised and unsupervised feature learning, analogous to these categories in machine learning generally.

  • In supervised feature learning, features are learned with labeled input data.
  • In unsupervised feature learning, features are learned with unlabeled input data.

Feature Selection

In machine learning and statistics, feature selection, also known as variable selection, attribute selection or variable subset selection, is the process of selecting a subset of relevant features(variables, predictors) for use in model construction. Feature selection techniques are used for three reasons:

  • Simplification of models to make them easier to interpret by researchers/users
  • Shorter training times
  • Enhanced generalization by reducing overfitting(formally, reduction of variance)

The central premise when using a feature selection techniques is that the data contains many features that are either redundant or irrelevant, and can thus be removed without incurring much loss of information. Redundant or irrelevant features are two distinct notions, since one relevant feature may be redundant in the presence of another relevant feature with which it is strongly correlated.

Feature selection techniques should be distinguished from feature extraction. Feature extraction creates new features from functions of the original features, whereas feature selection returns a subset of the features. Feature selection techniques are often used in domains where there are many features and comparatively few samples(or data points). Archetypal cases for the application of feature selection include the analysis of written texts and DNA microarray data, where there are many thousands of features, and a few tens to hundreds of samples.


  • Floating Sequential Selection Method (FSSM)
  • Decision Tree Method (DTM)
  • ...

In [ ]: