Title: Handling Missing Values In Time Series
Slug: handling_missing_values_in_time_series
Summary: How to handle the missing values in time series in pandas for machine learning in Python.
Date: 2017-09-11 12:00
Category: Machine Learning
Tags: Preprocessing Dates And Times
Authors: Chris Albon

Preliminaries


In [1]:
# Load libraries
import pandas as pd
import numpy as np

Create Date Data With Gap In Values


In [2]:
# Create date
time_index = pd.date_range('01/01/2010', periods=5, freq='M')

# Create data frame, set index
df = pd.DataFrame(index=time_index)

# Create feature with a gap of missing values
df['Sales'] = [1.0,2.0,np.nan,np.nan,5.0]

Interpolate Missing Values


In [3]:
# Interpolate missing values
df.interpolate()


Out[3]:
Sales
2010-01-31 1.0
2010-02-28 2.0
2010-03-31 3.0
2010-04-30 4.0
2010-05-31 5.0

Forward-fill Missing Values


In [4]:
# Forward-fill
df.ffill()


Out[4]:
Sales
2010-01-31 1.0
2010-02-28 2.0
2010-03-31 2.0
2010-04-30 2.0
2010-05-31 5.0

Backfill Missing Values


In [5]:
# Back-fill
df.bfill()


Out[5]:
Sales
2010-01-31 1.0
2010-02-28 2.0
2010-03-31 5.0
2010-04-30 5.0
2010-05-31 5.0

Interpolate Missing Values But Only Up One Value


In [6]:
# Interpolate missing values
df.interpolate(limit=1, limit_direction='forward')


Out[6]:
Sales
2010-01-31 1.0
2010-02-28 2.0
2010-03-31 3.0
2010-04-30 NaN
2010-05-31 5.0