Title: Deleting Missing Values
Slug: deleting_missing_values
Summary: How to delete missing values.
Date: 2017-09-02 12:00
Category: Machine Learning
Tags: Preprocessing Structured Data
Authors: Chris Albon

Preliminaries


In [1]:
# Load library
import numpy as np
import pandas as pd

Create Data Frame


In [2]:
# Create feature matrix
X = np.array([[1, 2], 
              [6, 3], 
              [8, 4], 
              [9, 5], 
              [np.nan, 4]])

Drop Missing Values Using NumPy


In [3]:
# Remove observations with missing values
X[~np.isnan(X).any(axis=1)]


Out[3]:
array([[ 1.,  2.],
       [ 6.,  3.],
       [ 8.,  4.],
       [ 9.,  5.]])

Drop Missing Values Using pandas


In [4]:
# Load data as a data frame
df = pd.DataFrame(X, columns=['feature_1', 'feature_2'])

# Remove observations with missing values
df.dropna()


Out[4]:
feature_1 feature_2
0 1.0 2.0
1 6.0 3.0
2 8.0 4.0
3 9.0 5.0