Title: Discretize Features
Slug: discretize_features
Summary: How to discretize features for machine learning in Python.
Date: 2016-09-06 12:00
Category: Machine Learning
Tags: Preprocessing Structured Data
Authors: Chris Albon

Preliminaries


In [1]:
# Load libraries
from sklearn.preprocessing import Binarizer
import numpy as np

Create Data


In [2]:
# Create feature
age = np.array([[6], 
                [12], 
                [20], 
                [36], 
                [65]])

Option 1: Binarize Feature


In [3]:
# Create binarizer
binarizer = Binarizer(18)

# Transform feature
binarizer.fit_transform(age)


Out[3]:
array([[0],
       [0],
       [1],
       [1],
       [1]])

Option 2: Break Up Feature Into Bins


In [4]:
# Bin feature
np.digitize(age, bins=[20,30,64])


Out[4]:
array([[0],
       [0],
       [1],
       [2],
       [3]])