In [1]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

%matplotlib inline

In [2]:
!wget http://www.metoffice.gov.uk/pub/data/weather/uk/climate/stationdata/oxforddata.txt


--2015-03-09 11:31:34--  http://www.metoffice.gov.uk/pub/data/weather/uk/climate/stationdata/oxforddata.txt
Resolving www.metoffice.gov.uk (www.metoffice.gov.uk)... 62.24.201.59, 62.24.201.17
Connecting to www.metoffice.gov.uk (www.metoffice.gov.uk)|62.24.201.59|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 99757 (97K) [text/plain]
Saving to: ‘oxforddata.txt’

100%[======================================>] 99,757      --.-K/s   in 0.04s   

2015-03-09 11:31:35 (2.53 MB/s) - ‘oxforddata.txt’ saved [99757/99757]


In [3]:
!head oxforddata.txt


Oxford
Location: 4509E 2072N, 63 metres amsl
Estimated data is marked with a * after the value.
Missing data (more than 2 days missing in month) is marked by  ---.
Sunshine data taken from an automatic Kipp & Zonen sensor marked with a #, otherwise sunshine data taken from a Campbell Stokes recorder.
   yyyy  mm   tmax    tmin      af    rain     sun
              degC    degC    days      mm   hours
   1853   1    8.4     2.7       4    62.8     ---
   1853   2    3.2    -1.8      19    29.3     ---
   1853   3    7.7    -0.6      20    25.9     ---

In [4]:
df = pd.read_csv("oxforddata.txt", header=5, skiprows=[6],
                 usecols=[0, 1, 5], skipinitialspace=True, 
                 comment='P', # Ignore the 'Provisional statement'
                 engine='c', dtype=None, delim_whitespace=True)

df = df.applymap(lambda x:float(str(x).rstrip('*'))) # Remove *'s

df.head()


Out[4]:
yyyy mm rain
0 1853 1 62.8
1 1853 2 29.3
2 1853 3 25.9
3 1853 4 60.1
4 1853 5 59.5

In [5]:
grouped = df.groupby('mm')['rain'].mean()
ax = grouped.plot(kind='bar', x='mm', y='rain')
ax.set_xlabel("Month")
ax.set_ylabel("Amount of rain in mm")


Out[5]:
<matplotlib.text.Text at 0x7f1ecc809850>

Building a model

Let's look at data for a single month:


In [16]:
January = df[df.mm == 5]
fig, ax = plt.subplots(figsize=(14, 5))
ax.plot(January.yyyy.values, January.rain.values, "-o", color='k')
plt.show()



In [ ]: