The object of this exercise is to take weather observations from past winters in Minneapolis and determine which of them could be considered the worst winter ever. Various criteria will be used, such as the number of days below zero degrees (F) and the number of days with heavy snowfall, and a Badness Index will be assigned to each winter using these criteria.
NOAA has some great weather records. These data points come from the weather station at the Minneapolis-St Paul International Airport. The good stuff starts around 1938. I pulled the data from NOAA-NCDC at http://www.ncdc.noaa.gov/cdo-web/datatools, specifically Monthly Summaries data. The dataset is directly available here: https://github.com/MarcKjerland/Worst-Winter-Ever/blob/master/mpls-noaa.csv.
Here I've defined winter as November through March. Your definition may vary! Some of the variables would translate well to an expanded winter season. Further criteria could be added to highlight painfully long winters or miserable holiday travel conditions, for example.
First I do some data wrangling to prepare the raw data for analysis.
In [1]:
import pandas as pd
# Read data, sort by year & month
dateparse = lambda x: pd.datetime.strptime(x, '%Y%m%d')
noaa_monthly = pd.read_csv('mpls-noaa.csv', index_col=2,
parse_dates=True, date_parser=dateparse, na_values=-9999)
noaa_monthly = noaa_monthly.groupby([noaa_monthly.index.year, noaa_monthly.index.month]).sum()
# Sum seasonal totals
winter_vars = ['MNTM','EMNT','DT00','DX32','HTDD','MXSD','EMXP','TSNW','DP10']
year_start = 1938
year_end = 2014
season_start = 11 #November
season_end = 3 #March
noaa_winters = pd.concat(
[noaa_monthly.loc[(year, season_start):(year+1, season_end), winter_vars].sum(axis=0)
for year in range(year_start, year_end+1)], axis=1).transpose()
noaa_winters.index = range(year_start, year_end+1)
# Fix variables that should have been handled differently
noaa_winters['TSNW'] /= 24.4
for year in noaa_winters.index:
noaa_winters.loc[year, 'MNTM'] = \
noaa_monthly.loc[(year, season_start):(year+1, season_end), 'MNTM'].mean() * 0.18 + 32
noaa_winters.loc[year, 'EMNT'] = \
noaa_monthly.loc[(year, season_start):(year+1, season_end), 'EMNT'].min() * 0.18 + 32
noaa_winters.loc[year, 'MXSD'] = \
noaa_monthly.loc[(year, season_start):(year+1, season_end), 'MXSD'].max() / 24.4
noaa_winters.loc[year, 'EMXP'] = \
noaa_monthly.loc[(year, season_start):(year+1, season_end), 'EMXP'].max() / 24.4
Here are the variables used to determine which Minneapolis winter was the worst. In the future I'd love to include others related to wind chill, cloud cover, high gusts, freezing rain, and other wintery hazards, but this monthly NCDC dataset didn't include them. Perhaps the daily figures are worth looking into.
The units are American: inches and Fahrenheit.
Note on HTDD (Heating Degree Days): "A form of degree day used to estimate energy requirements for heating. Typically, heating degree days are calculated as how much colder the mean temperature at a location is than 65°F on a given day. For example, if a location experiences a mean temperature of 55°F on a certain day, there were 10 HDD (Heating Degree Days) that day because 65 - 55 = 10." The units are almost surely off for this variable.
In [2]:
acronym = { 'HTDD': 'Heating degree days',
'DP10': 'Number of days with greater than or equal to 1.0 inch of precipitation',
'MXSD': 'Maximum snow depth, inches',
'EMXP': 'Extreme maximum daily precipitation, inches',
'DT00': 'Number days with minimum temperature less than or equal to 0.0 F',
'DX32': 'Number days with maximum temperature less than or equal to 32.0 F',
'EMNT': 'Extreme minimum daily temperature',
'TSNW': 'Total snow fall, inches',
'MNTM': 'Mean temperature (F)'}
# Plot variables
import matplotlib.pyplot as plt
%matplotlib inline
for v in noaa_winters.columns:
noaa_winters[v].plot(figsize=(13,3), color='skyblue');
# Compute rolling mean, use linear interpolation for the data gap in TSNW
pd.rolling_mean(noaa_winters[v].interpolate(), 15).plot(color='blue')
plt.title(acronym[v])
plt.legend(["observed data", "15-year rolling average"], loc='best')
plt.show()
To determine the badness index for a particular winter, first we assign to each of its variables a score from 0 to 100. A score of 100 means it was the worst or coldest recorded value (for example, more snowfall than any other winter) and a score of 0 means it was the least bad or warmest recorded value (for example, less snowfall than any winter); otherwise the variable will get a score somewhere in between (on a linear scale). Then each winter is assigned a badness index equal to the average of each of its variable scores, ranging from 0 to 100.
In [4]:
# Find the best & worst for each variable
winter_coldest = pd.Series(index=noaa_winters.columns)
winter_warmest = pd.Series(index=noaa_winters.columns)
# For these variables, big is bad
for v in ['HTDD','MXSD','EMXP','DT00','DX32','TSNW','DP10']:
winter_coldest[v] = noaa_winters[v].max()
winter_warmest[v] = noaa_winters[v].min()
# For these variables, small (or negative) is bad
for v in ['MNTM','EMNT']:
winter_coldest[v] = noaa_winters[v].min()
winter_warmest[v] = noaa_winters[v].max()
# Assign scores to each year
winter_score = 100 * (noaa_winters-winter_warmest).abs() / (winter_coldest-winter_warmest).abs()
badness = winter_score.mean(axis=1)
# Plot the Badness Index
badness.plot(figsize=(13,6), marker='s', color='skyblue', xticks=badness.index[2::5])
pd.rolling_mean(badness, 15).plot(color='blue')
plt.title("Badness Index of each Minneapolis winter")
plt.ylabel("Badness index")
plt.xlabel("Year (start of winter)")
plt.legend(["Computed Badness", "15-year rolling average"], loc="best")
plt.show()
So there you have it. The winter of 2013-14 was pretty bad, but probably not as bad as the winter of 1996-97, 1950-51, or 1964-65 (that one must have been a doozy).