In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib
%matplotlib inline
matplotlib.style.use('ggplot')
In [2]:
f = "~/github/datasets/598354.csv"
data = pd.read_csv(f)
In [3]:
data.head()
Out[3]:
In order to select the stations, we can select the following data from the initial amount:
In [4]:
data2 = data[(data.TMIN>-9999)]
data3 = data2[(data2.DATE>=20150601) & (data2.DATE<=20150630) & (data2.PRCP>0)]
So we can print data3 and, then, select the stations in the table that will be printed.
In [5]:
stations = data2[(data2.STATION=='GHCND:USC00047326') | (data2.STATION=='GHCND:USC00047902') | (data2.STATION=='GHCND:USC00044881')]
In [6]:
st = stations.groupby(['STATION'])
temp = st.agg({'TMIN' : [np.min], 'TMAX' : [np.max]})
temp.plot(kind='bar')
Out[6]:
Analysing the plot above, we can see that the 3 cities experienced a big variation of temperature in the time of observation. The variation was more expressive in Lee Vining.
In [7]:
june = stations[(stations.DATE>=20150601) & (stations.DATE<=20150630)]
rain = june.groupby(['STATION'])
rain.plot('DATE','PRCP')
Out[7]:
Among the three selected cities, Lee Vining (USC00044881) was the one with more raining days. However, in only one day it rained more in Santa Barbara (USC00047902) than the amount of rain in Lee Vining. When compared with those two cities, it almost did not rain in Redondo Beach.