Explorando datos de Pokémon con Pandas

Importar librerías


In [1]:
import pandas as pd

Cargar datos para crear un DataFrame


In [2]:
data = pd.read_csv('Pokemon.csv', index_col='#')

La unidad canónica de pandas es el DataFrame, que se parece a un spreadsheet en Excel. Tiene filas y columnas nombradas. Es muy fácil e intuitivo manipular.


In [ ]:

Echar un vistazo a las primeras filas


In [3]:
data.head()


Out[3]:
Name Type 1 Type 2 Total HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
#
1 Bulbasaur Grass Poison 318 45 49 49 65 65 45 1 False
2 Ivysaur Grass Poison 405 60 62 63 80 80 60 1 False
3 Venusaur Grass Poison 525 80 82 83 100 100 80 1 False
3 VenusaurMega Venusaur Grass Poison 625 80 100 123 122 120 80 1 False
4 Charmander Fire NaN 309 39 52 43 60 50 65 1 False

Trabajando con un Pandas "Series"


In [6]:
type(data['Name'])


Out[6]:
pandas.core.series.Series

Acceder a ciertas columnas o filas


In [7]:
(data.Name == data['Name']).all()


Out[7]:
True

Un pandas "Series" es lo que le decimos a una columna de un DataFrame. Se exponen varios métodos ahí mismo, que hace que sea muy fácil preguntar cosas de él.

¿Cuál es el "total" más grande?


In [75]:
data['HP'].max()


Out[75]:
255

¿Quién tiene el total más grande?


In [76]:
total_mas_grande = data.HP.max()

In [79]:
data[data.HP == total_mas_grande]


Out[79]:
Name Type 1 Type 2 Total HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
#
242 Blissey Normal NaN 540 255 10 10 75 135 55 2 False

¿Cuál es el "Speed" promedio?


In [80]:
data.Speed.mean()


Out[80]:
68.2775

In [92]:
data[True == data['Legendary']]


Out[92]:
Name Type 1 Type 2 Total HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
#
144 Articuno Ice Flying 580 90 85 100 95 125 85 1 True
145 Zapdos Electric Flying 580 90 90 85 125 90 100 1 True
146 Moltres Fire Flying 580 90 100 90 125 85 90 1 True
150 Mewtwo Psychic NaN 680 106 110 90 154 90 130 1 True
150 MewtwoMega Mewtwo X Psychic Fighting 780 106 190 100 154 100 130 1 True
150 MewtwoMega Mewtwo Y Psychic NaN 780 106 150 70 194 120 140 1 True
243 Raikou Electric NaN 580 90 85 75 115 100 115 2 True
244 Entei Fire NaN 580 115 115 85 90 75 100 2 True
245 Suicune Water NaN 580 100 75 115 90 115 85 2 True
249 Lugia Psychic Flying 680 106 90 130 90 154 110 2 True
250 Ho-oh Fire Flying 680 106 130 90 110 154 90 2 True
377 Regirock Rock NaN 580 80 100 200 50 100 50 3 True
378 Regice Ice NaN 580 80 50 100 100 200 50 3 True
379 Registeel Steel NaN 580 80 75 150 75 150 50 3 True
380 Latias Dragon Psychic 600 80 80 90 110 130 110 3 True
380 LatiasMega Latias Dragon Psychic 700 80 100 120 140 150 110 3 True
381 Latios Dragon Psychic 600 80 90 80 130 110 110 3 True
381 LatiosMega Latios Dragon Psychic 700 80 130 100 160 120 110 3 True
382 Kyogre Water NaN 670 100 100 90 150 140 90 3 True
382 KyogrePrimal Kyogre Water NaN 770 100 150 90 180 160 90 3 True
383 Groudon Ground NaN 670 100 150 140 100 90 90 3 True
383 GroudonPrimal Groudon Ground Fire 770 100 180 160 150 90 90 3 True
384 Rayquaza Dragon Flying 680 105 150 90 150 90 95 3 True
384 RayquazaMega Rayquaza Dragon Flying 780 105 180 100 180 100 115 3 True
385 Jirachi Steel Psychic 600 100 100 100 100 100 100 3 True
386 DeoxysNormal Forme Psychic NaN 600 50 150 50 150 50 150 3 True
386 DeoxysAttack Forme Psychic NaN 600 50 180 20 180 20 150 3 True
386 DeoxysDefense Forme Psychic NaN 600 50 70 160 70 160 90 3 True
386 DeoxysSpeed Forme Psychic NaN 600 50 95 90 95 90 180 3 True
480 Uxie Psychic NaN 580 75 75 130 75 130 95 4 True
... ... ... ... ... ... ... ... ... ... ... ... ...
486 Regigigas Normal NaN 670 110 160 110 80 110 100 4 True
487 GiratinaAltered Forme Ghost Dragon 680 150 100 120 100 120 90 4 True
487 GiratinaOrigin Forme Ghost Dragon 680 150 120 100 120 100 90 4 True
491 Darkrai Dark NaN 600 70 90 90 135 90 125 4 True
492 ShayminLand Forme Grass NaN 600 100 100 100 100 100 100 4 True
492 ShayminSky Forme Grass Flying 600 100 103 75 120 75 127 4 True
493 Arceus Normal NaN 720 120 120 120 120 120 120 4 True
494 Victini Psychic Fire 600 100 100 100 100 100 100 5 True
638 Cobalion Steel Fighting 580 91 90 129 90 72 108 5 True
639 Terrakion Rock Fighting 580 91 129 90 72 90 108 5 True
640 Virizion Grass Fighting 580 91 90 72 90 129 108 5 True
641 TornadusIncarnate Forme Flying NaN 580 79 115 70 125 80 111 5 True
641 TornadusTherian Forme Flying NaN 580 79 100 80 110 90 121 5 True
642 ThundurusIncarnate Forme Electric Flying 580 79 115 70 125 80 111 5 True
642 ThundurusTherian Forme Electric Flying 580 79 105 70 145 80 101 5 True
643 Reshiram Dragon Fire 680 100 120 100 150 120 90 5 True
644 Zekrom Dragon Electric 680 100 150 120 120 100 90 5 True
645 LandorusIncarnate Forme Ground Flying 600 89 125 90 115 80 101 5 True
645 LandorusTherian Forme Ground Flying 600 89 145 90 105 80 91 5 True
646 Kyurem Dragon Ice 660 125 130 90 130 90 95 5 True
646 KyuremBlack Kyurem Dragon Ice 700 125 170 100 120 90 95 5 True
646 KyuremWhite Kyurem Dragon Ice 700 125 120 90 170 100 95 5 True
716 Xerneas Fairy NaN 680 126 131 95 131 98 99 6 True
717 Yveltal Dark Flying 680 126 131 95 131 98 99 6 True
718 Zygarde50% Forme Dragon Ground 600 108 100 121 81 95 95 6 True
719 Diancie Rock Fairy 600 50 100 150 100 150 50 6 True
719 DiancieMega Diancie Rock Fairy 700 50 160 110 160 110 110 6 True
720 HoopaHoopa Confined Psychic Ghost 600 80 110 60 150 130 70 6 True
720 HoopaHoopa Unbound Psychic Dark 680 80 160 60 170 130 80 6 True
721 Volcanion Fire Water 600 80 110 120 130 90 70 6 True

65 rows × 12 columns

¿Cómo son los valores del estadístico "Attack"?


In [49]:
%matplotlib inline

Pandas nos permite hacer muchas visualizaciones desde el Series o DataFrame mismo, haciéndolo muy fácil obtener una vista ancha de la forma de tus datos.


In [61]:
data['Attack'].hist()


Out[61]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f675d71c6d8>

Solo para estar seguro..


In [62]:
data['Attack'].min()


Out[62]:
5

In [63]:
data['Attack'].max()


Out[63]:
190

Cuantos de cada "Type 1" y "Type 2" tenemos?


In [98]:
data.groupby('Type 1')['Total'].mean().sort_values(ascending=False)


Out[98]:
Type 1
Dragon      550.531250
Steel       487.703704
Flying      485.000000
Psychic     475.947368
Fire        458.076923
Rock        453.750000
Dark        445.741935
Electric    443.409091
Ghost       439.562500
Ground      437.500000
Ice         433.458333
Water       430.455357
Grass       421.142857
Fighting    416.444444
Fairy       413.176471
Normal      401.683673
Poison      399.142857
Bug         378.927536
Name: Total, dtype: float64

In [ ]:

"Aggregation"

¿Cuál es el "Total" promedio de cada Pokémon de "Type 1" y "Type 2"?


In [42]:
data.groupby('Type 2')['Total'].mean().sort_values(ascending=False)


Out[42]:
Type 2
Dragon      526.166667
Fighting    525.846154
Ice         525.714286
Fire        506.250000
Steel       485.227273
Dark        484.400000
Psychic     479.060606
Electric    455.333333
Flying      452.546392
Ground      444.342857
Rock        434.642857
Ghost       430.714286
Water       418.214286
Fairy       417.956522
Normal      411.500000
Grass       408.920000
Poison      396.500000
Bug         393.333333
Name: Total, dtype: float64

¿Cuál es el "Attack" máximo de cada combinación de "Type 1" y "Type 2"?


In [46]:
data.groupby(['Type 1', 'Type 2'])['Attack'].max().sort_values(ascending=False)


Out[46]:
Type 1    Type 2  
Psychic   Fighting    190
Bug       Fighting    185
Ground    Fire        180
Dragon    Flying      180
          Ice         170
          Ground      170
Rock      Dark        164
Fire      Fighting    160
Rock      Fairy       160
Psychic   Dark        160
Water     Dark        155
Bug       Flying      155
          Poison      150
Dragon    Electric    150
Steel     Ghost       150
Water     Ground      150
Bug       Steel       150
Fighting  Steel       145
Steel     Psychic     145
Ground    Flying      145
Rock      Flying      140
Ground    Rock        140
Normal    Fighting    136
Ground    Steel       135
Grass     Ice         132
Dark      Flying      131
Fire      Flying      130
Ice       Ground      130
Dragon    Psychic     130
Grass     Fighting    130
                     ... 
Poison    Dragon       75
Electric  Steel        70
Water     Grass        70
Ground    Psychic      70
Flying    Dragon       70
Normal    Fairy        70
Fire      Psychic      69
          Normal       68
Grass     Fairy        67
Ground    Electric     66
Ghost     Poison       65
Electric  Water        65
          Fire         65
          Ice          65
          Grass        65
Water     Ghost        60
Poison    Water        60
Electric  Fairy        58
Water     Electric     58
Normal    Ground       56
Ghost     Fire         55
Rock      Steel        55
Electric  Normal       55
Fire      Rock         50
Poison    Bug          50
Water     Fairy        50
Ice       Psychic      50
Fairy     Flying       50
Electric  Ghost        50
Bug       Water        30
Name: Attack, dtype: int64

Visualizar con seaborn


In [49]:
import seaborn as sns

Visualizar una columna contra otra


In [51]:
sns.jointplot(x='Sp. Atk', y='Sp. Def', data=data, kind='reg')


Out[51]:
<seaborn.axisgrid.JointGrid at 0x11380e630>

In [50]:
data.head()


Out[50]:
Name Type 1 Type 2 Total HP Attack Defense Sp. Atk Sp. Def Speed
#
1 Bulbasaur Grass Poison 318 45 49 49 65 65 45
2 Ivysaur Grass Poison 405 60 62 63 80 80 60
3 Venusaur Grass Poison 525 80 82 83 100 100 80
3 VenusaurMega Venusaur Grass Poison 625 80 100 123 122 120 80
4 Charmander Fire NaN 309 39 52 43 60 50 65

Crear un boxplot de las columnas que nos importan


In [53]:
sns.boxplot(data = data.drop(['Name', 'Total'], axis=1).head())


Out[53]:
<matplotlib.axes._subplots.AxesSubplot at 0x113a778d0>