# Ejercicios de análisis de datos con pandas

Fuente: https://github.com/PyDataMadrid2016/Conference-Info/tree/master/workshops_materials/20160408_1100_Pandas_for_beginners/tutorial por Kiko Correoso, licencia MIT

En la carpeta de datos tenemos un fichero que se llama model.txt que contiene datos de medidas de viento: velocidad, orientación, temperatura...

``````

In [1]:

``````
``````

Lat=0  Lon=0  Hub-Height=987  Timezone=88   ASL-Height(avg. 3km-grid)=0
Provider (www.provider.noway) - Computed at 3km resolution based on GLOBAL data (designed for correlation purposes)

YYYYMMDD HHMM  M(m/s) D(deg)  T(C)  De(k/m3) PRE(hPa)      RiNumber  RH(%)
19840101 0000   20.8    243    7.3    1.25    1002.8           0.04   86.3
19840101 0100   20.8    243    7.3    1.25    1002.2           0.04   86.9
19840101 0200   20.6    243    7.4    1.24    1001.8           0.04   87.0
19840101 0300   20.4    244    7.5    1.24    1001.5           0.05   86.2
19840101 0400   20.3    245    7.5    1.24    1001.2           0.05   85.6
19840101 0500   20.4    245    7.6    1.24    1000.9           0.06   85.2

``````
``````

In [2]:

import pandas as pd
import numpy as np

%matplotlib inline
import matplotlib.pyplot as plt
import matplotlib as mpl

from IPython.display import display

``````
``````

In [3]:

"../data/model.txt", delim_whitespace=True, skiprows = 3,
parse_dates = {'Timestamp': [0, 1]}, index_col = 'Timestamp')

``````
``````

In [4]:

``````
``````

Out[4]:

M(m/s)
D(deg)
T(C)
De(k/m3)
PRE(hPa)
RiNumber
RH(%)

Timestamp

1984-01-01 00:00:00
20.8
243
7.3
1.25
1002.8
0.04
86.3

1984-01-01 01:00:00
20.8
243
7.3
1.25
1002.2
0.04
86.9

1984-01-01 02:00:00
20.6
243
7.4
1.24
1001.8
0.04
87.0

1984-01-01 03:00:00
20.4
244
7.5
1.24
1001.5
0.05
86.2

1984-01-01 04:00:00
20.3
245
7.5
1.24
1001.2
0.05
85.6

``````

## Ejercicios

Sobre el conjunto de datos `model`:

1. Representar la matriz `scatter` de la velocidad y orientación del viento de los primeros mil registros.
2. Misma matriz scatter para los 1000 registros con mayor velocidad, ordenados.
3. Histograma de la velocidad del viento con 36 particiones.
4. Histórico de la velocidad media, con los datos agrupados por años y meses.
5. Tabla de velocidades medias en función del año (filas) y del mes (columnas).
6. Gráfica con los históricos de cada año, agrupados por meses, superpuestos.

Representamos la matriz scatter de la velocidad y orientación del viento de los primeros mil registros:

``````

In [5]:

pd.tools.plotting.scatter_matrix(model.loc[model.index[:1000], 'M(m/s)':'D(deg)'])

``````
``````

Out[5]:

array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7f7e8ed38940>,
<matplotlib.axes._subplots.AxesSubplot object at 0x7f7e8edc7208>],
[<matplotlib.axes._subplots.AxesSubplot object at 0x7f7e8eda34a8>,
<matplotlib.axes._subplots.AxesSubplot object at 0x7f7e8ecbcef0>]], dtype=object)

``````

Misma matriz scatter para los 1000 registros con mayor velocidad:

``````

In [6]:

pd.tools.plotting.scatter_matrix(
model.loc[model.sort_values('M(m/s)', ascending=False).index[:1000],
'M(m/s)':'D(deg)']
)

``````
``````

Out[6]:

array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7f7e8ebccc88>,
[<matplotlib.axes._subplots.AxesSubplot object at 0x7f7e8eccc710>,
<matplotlib.axes._subplots.AxesSubplot object at 0x7f7e8eb2e828>]], dtype=object)

``````
``````

In [7]:

model.loc[:, 'M(m/s)'].plot.hist(bins=np.arange(0, 35))

``````
``````

Out[7]:

<matplotlib.axes._subplots.AxesSubplot at 0x7f7e8ea98a90>

``````
``````

In [8]:

model['month'] = model.index.month
model['year'] = model.index.year

``````

``````

In [9]:

``````
``````

Out[9]:

M(m/s)
D(deg)
T(C)
De(k/m3)
PRE(hPa)
RiNumber
RH(%)

year
month

1984
1
13.821505
233.603495
4.621505
1.247419
993.974328
-2.823320
81.038575

2
10.043534
164.089080
3.522701
1.269368
1007.659339
-29.335790
83.860201

3
8.708199
153.081989
4.138306
1.261425
1003.437231
0.250726
84.740591

4
7.939583
162.113889
6.781111
1.255986
1008.487083
-1.655014
74.165417

5
8.283871
123.590054
8.034677
1.238763
998.939113
-0.384704
87.813306

6
7.330833
231.652778
12.068194
1.228417
1005.009861
11.744944
81.090694

7
7.710618
222.166667
14.508199
1.219906
1006.686962
3.071788
78.730376

8
7.324194
147.930108
16.510484
1.210497
1006.095565
4.732742
79.204704

9
9.335694
231.116667
13.960000
1.210347
997.008889
-6.703986
81.153889

10
10.985215
225.610215
12.109140
1.223790
1001.571505
-9.919731
83.965188

11
10.789306
189.393056
9.877639
1.223389
993.419167
-183.106069
86.480694

12
9.600941
213.807796
6.825000
1.253763
1006.855242
-28.626452
87.389651

1985
1
10.424059
157.752688
1.499059
1.269126
1000.074059
-21.708602
82.745430

2
8.649851
150.980655
1.323512
1.283512
1010.598214
1.706592
82.009226

3
9.097043
190.330645
3.800000
1.259046
1000.441667
-0.927110
87.532258

4
10.895833
217.081944
7.155694
1.242097
998.885694
-5.238778
82.452083

5
9.147984
166.154570
9.715726
1.233145
1001.046371
6.960806
87.404570

6
8.583056
201.036111
11.734444
1.225500
1001.283194
-0.569833
82.841944

7
8.865860
216.310484
15.546640
1.211720
1003.435484
2.259113
80.224194

8
10.379704
224.643817
14.996774
1.210685
1000.788306
0.740094
80.920296

9
8.181250
228.433333
14.561250
1.220333
1007.310000
-2.072278
80.053889

10
8.021640
156.831989
12.262231
1.235470
1011.755645
-126.452137
82.420565

11
10.734167
196.769444
5.608333
1.252042
1000.843611
-18.886847
78.602639

12
12.567339
220.547043
7.274328
1.243387
1000.259409
-5.167849
88.165457

``````
``````

In [10]:

model.groupby(by=['year', 'month']).mean().plot(y='M(m/s)', figsize=(15, 5))

``````
``````

Out[10]:

<matplotlib.axes._subplots.AxesSubplot at 0x7f7e8e9fc908>

``````

Media móvil de los datos agrupados por mes y año:

``````

In [11]:

monthly = model.groupby(by=['year', 'month']).mean()
monthly['ma'] = monthly.loc[:, 'M(m/s)'].rolling(5, center=True).mean()

``````
``````

Out[11]:

M(m/s)
D(deg)
T(C)
De(k/m3)
PRE(hPa)
RiNumber
RH(%)
ma

year
month

1984
1
13.821505
233.603495
4.621505
1.247419
993.974328
-2.823320
81.038575
NaN

2
10.043534
164.089080
3.522701
1.269368
1007.659339
-29.335790
83.860201
NaN

3
8.708199
153.081989
4.138306
1.261425
1003.437231
0.250726
84.740591
9.759339

4
7.939583
162.113889
6.781111
1.255986
1008.487083
-1.655014
74.165417
8.461204

5
8.283871
123.590054
8.034677
1.238763
998.939113
-0.384704
87.813306
7.994621

``````
``````

In [12]:

monthly.loc[:, ['M(m/s)', 'ma']].plot(figsize=(15, 6))

``````
``````

Out[12]:

<matplotlib.axes._subplots.AxesSubplot at 0x7f7e8e918908>

``````
``````

In [13]:

monthly.loc[:, 'M(m/s)'].reset_index().pivot(index='year', columns='month')

``````
``````

Out[13]:

M(m/s)

month
1
2
3
4
5
6
7
8
9
10
11
12

year

1984
13.821505
10.043534
8.708199
7.939583
8.283871
7.330833
7.710618
7.324194
9.335694
10.985215
10.789306
9.600941

1985
10.424059
8.649851
9.097043
10.895833
9.147984
8.583056
8.865860
10.379704
8.181250
8.021640
10.734167
12.567339

1986
13.830972
10.303423
10.832527
9.217222
11.092742
9.570000
6.824866
8.847043
7.099028
10.331855
12.446944
13.712500

1987
8.762231
9.502530
10.386962
9.063056
9.670161
7.983889
7.672312
7.507796
8.731528
10.418011
9.445417
10.355376

1988
12.345430
13.447557
10.783333
8.511806
9.107661
7.594167
10.986022
8.857124
10.319167
9.518683
8.726111
11.036559

1989
9.891935
12.448958
11.776478
9.206944
8.491667
7.417083
7.892876
8.117876
8.210139
11.445430
9.406389
9.345430

1990
13.390726
16.317262
11.582661
9.273889
7.007796
8.217778
8.460618
7.672581
8.973333
12.126210
9.382361
12.120161

1991
11.082124
8.971131
9.425000
10.917083
8.013441
10.012361
8.981317
7.620565
7.437778
9.339382
11.320278
9.228898

1992
9.519624
9.928879
11.797849
9.496667
9.234140
7.170139
7.781048
10.290054
9.600278
10.010215
12.911806
9.820027

1993
13.553629
8.300298
9.620296
9.226944
10.691801
7.086528
8.857527
8.148925
8.027778
9.998925
8.559306
13.214651

1994
12.743817
9.300446
13.058468
10.817361
8.309677
8.898056
7.125806
8.196505
9.523194
9.311290
9.085000
12.423925

1995
14.024866
13.239137
12.197043
9.416944
7.321909
9.166389
8.548118
8.264382
8.653056
9.555376
10.049861
8.207661

1996
10.754839
11.089511
8.685753
8.493611
10.265726
7.620417
8.129167
9.162634
9.287639
10.456048
11.589306
9.658468

1997
8.347715
14.924107
9.867876
8.124444
8.584274
9.646667
7.433871
7.436156
8.018889
9.434005
10.480139
11.958468

1998
13.109677
11.514286
10.942339
9.500417
8.895833
10.409583
9.251882
8.233871
9.785556
13.341532
9.758194
12.096640

1999
12.528763
11.269048
8.439247
9.330000
9.694220
7.580694
8.846640
7.202285
9.258333
10.698253
11.538472
13.665591

2000
11.494892
12.468247
10.172715
9.103472
8.910349
8.529167
7.106048
6.768145
9.485417
12.016129
12.598889
12.427823

2001
10.253495
9.584673
9.664516
10.243333
9.074731
8.480000
7.584005
7.993952
9.700417
12.024194
10.040139
10.264516

2002
11.843817
14.686756
9.946237
9.673611
10.154435
8.813611
8.414382
6.051613
7.703750
10.212903
10.006389
10.368280

2003
11.521909
9.675744
8.608065
9.995833
9.328091
7.710972
8.186559
6.483871
6.286944
10.084005
10.470139
10.409140

2004
12.250134
11.419828
10.138710
7.813889
6.522446
8.699167
7.430780
8.822043
10.807083
11.328091
8.940000
9.402823

2005
13.587500
10.366071
9.831720
8.748194
9.718414
7.874444
7.609946
7.816129
7.758333
10.028763
10.983056
10.680376

2006
9.297984
10.054167
10.873118
9.237639
9.704032
6.799444
7.897984
8.187634
8.796111
10.690323
12.469722
11.947446

2007
14.101075
9.645387
10.546640
8.262500
9.168145
8.840694
9.709005
8.060215
9.180694
7.139113
11.032639
11.233602

2008
14.116129
9.943103
12.370161
9.460139
9.980376
7.843889
8.795296
10.270161
8.654861
10.218548
10.714444
9.353360

2009
10.223387
8.581101
9.415591
8.026250
9.916532
6.409583
9.587366
8.451210
8.748472
9.085215
12.744306
10.260215

2010
8.289919
9.695387
9.692876
7.802083
7.061559
7.421528
7.606586
9.204167
8.591944
10.626747
9.750833
9.293548

2011
10.500941
11.319940
8.006317
8.849028
9.512500
9.312500
8.217742
8.245430
10.351806
10.840323
10.015833
13.737366

2012
11.287500
9.791810
7.621102
9.605000
9.024597
10.365000
8.137097
8.434812
9.145694
9.307258
10.382639
12.034812

2013
9.740323
9.398512
9.320027
10.267222
9.640860
9.429167
6.978495
7.817070
8.170417
11.430376
9.734028
12.709946

2014
12.961156
14.528125
9.064785
8.428750
8.235349
6.188056
8.442473
8.891801
6.275000
10.676882
9.564861
12.112500

2015
12.050134
10.035863
10.561156
8.027083
9.739382
9.018056
9.500134
8.748522
8.111250
NaN
NaN
NaN

``````
``````

In [14]:

monthly.loc[:, 'M(m/s)'].reset_index().pivot(
index='year', columns='month'
).T.loc['M(m/s)'].plot(
figsize=(15, 5), legend=False
)

``````
``````

Out[14]:

<matplotlib.axes._subplots.AxesSubplot at 0x7f7e8e8d8b70>

``````