Title: Handling Imbalanced Classes With Upsampling
Slug: handling_imbalanced_classes_with_upsampling
Summary: How to handle imbalanced classes with upsampling during machine learning in Python.
Date: 2016-09-06 12:00
Category: Machine Learning
Tags: Preprocessing Structured Data
Authors: Chris Albon
In [1]:
# Load libraries
import numpy as np
from sklearn.datasets import load_iris
In [2]:
# Load iris data
iris = load_iris()
# Create feature matrix
X = iris.data
# Create target vector
y = iris.target
In [3]:
# Remove first 40 observations
X = X[40:,:]
y = y[40:]
# Create binary target vector indicating if class 0
y = np.where((y == 0), 0, 1)
# Look at the imbalanced target vector
y
Out[3]:
In [4]:
# Indicies of each class' observations
i_class0 = np.where(y == 0)[0]
i_class1 = np.where(y == 1)[0]
# Number of observations in each class
n_class0 = len(i_class0)
n_class1 = len(i_class1)
# For every observation in class 1, randomly sample from class 0 with replacement
i_class0_upsampled = np.random.choice(i_class0, size=n_class1, replace=True)
# Join together class 0's upsampled target vector with class 1's target vector
np.concatenate((y[i_class0_upsampled], y[i_class1]))
Out[4]: