Title: Handling Imbalanced Classes With Downsampling
Slug: handling_imbalanced_classes_with_downsampling
Summary: How to handle imbalanced classes with downsampling during machine learning in Python.
Date: 2016-09-06 12:00
Category: Machine Learning
Tags: Preprocessing Structured Data
Authors: Chris Albon
In [1]:
# Load libraries
import numpy as np
from sklearn.datasets import load_iris
In [2]:
# Load iris data
iris = load_iris()
# Create feature matrix
X = iris.data
# Create target vector
y = iris.target
In [3]:
# Remove first 40 observations
X = X[40:,:]
y = y[40:]
# Create binary target vector indicating if class 0
y = np.where((y == 0), 0, 1)
# Look at the imbalanced target vector
y
Out[3]:
In [4]:
# Indicies of each class' observations
i_class0 = np.where(y == 0)[0]
i_class1 = np.where(y == 1)[0]
# Number of observations in each class
n_class0 = len(i_class0)
n_class1 = len(i_class1)
# For every observation of class 0, randomly sample from class 1 without replacement
i_class1_downsampled = np.random.choice(i_class1, size=n_class0, replace=False)
# Join together class 0's target vector with the downsampled class 1's target vector
np.hstack((y[i_class0], y[i_class1_downsampled]))
Out[4]: