In [156]:
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib as mpl
import numpy as np
import seaborn as sns
%matplotlib inline

What is this?

We have a Dataset of Pokemons. They have various values which indicate there strongness, total is simply the sum of these values.

Got this set from https://www.kaggle.com/abcsds/pokemon

Got the swarmplot idea from https://www.kaggle.com/ndrewgele/visualizing-pok-mon-stats-with-seaborn

Have a look:


In [157]:
data.head()


Out[157]:
# Name Type 1 Type 2 Total HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
0 1 Bulbasaur Bug Poison 318 45 49 49 65 65 45 1 False
1 2 Ivysaur Bug Poison 405 60 62 63 80 80 60 1 False
2 3 Venusaur Bug Poison 525 80 82 83 100 100 80 1 False
3 3 VenusaurMega Venusaur Bug Poison 625 80 100 123 122 120 80 1 False
4 4 Charmander Bug NaN 309 39 52 43 60 50 65 1 False

Okay, please try to create the following images:

Hints:

use seaborn.distplot, seaborn.swarmplot, seaborn.heatmap... for the swarmplot, you maybe need pandas.melt, its a reshape routine similar to the one in reshape2 in R.


In [142]:
from IPython.display import display, HTML
display(HTML("<h1>Okay, you want not to do this on your own.. then now: How to do this (scroll down)</h1>"))
for i in range(20):
    display(HTML("<br />"))


Okay, you want not to do this on your own.. then now: How to do this (scroll down)





















Okay, let's go!

</tr> </table> Now, let us inspect, which distributions these numerical columns have:


In [134]:
numerical_cols = [col for col in data.columns if data[col].dtype == 'int64']
numerical_cols.pop(0)
f, ax = plt.subplots(len(numerical_cols) / 2 + len(numerical_cols) % 2, 2, figsize=(20,20))
for i, col in enumerate(numerical_cols):
    axx = ax[i / 2, i % 2]
    sns.distplot(data[col], ax=axx)
    axx.set_title(col, fontsize=20)
f.suptitle("Distributions of Columns in Pokemon Data Set", fontsize=24)
f.savefig("figures/distributions.svg")


Okay, nice.

Now, some Heatmap of the mean of the type1 to type 2 pokemons:


In [117]:
numerical_cols.remove("Generation")

In [ ]:


In [136]:
for col in ['Type 1', 'Type 2']:
    data[col].fillna("Type not set", inplace=True)
mean_power = data.groupby(['Type 1', 'Type 2']).Total.mean().unstack()

f = plt.figure(figsize=(20,10))
with sns.axes_style("white"):
    sns.heatmap(
        mean_power, linewidths=0.5, cmap='coolwarm'
    )
plt.gcf().savefig("figures/example_heatmap.svg")


Nice, we need to choose ground & fire pokemon for the maximum total power. **_Do never choose Pokemon with the Bug&Ghost Kombi!_**!!!!!


In [139]:
f, ax = plt.subplots(len(numerical_cols) / 2 + len(numerical_cols) %2, 2, figsize=(20,30))
for i, col in enumerate(numerical_cols):
    axx = ax[i / 2, i % 2]
    with sns.axes_style("white"):
        sns.heatmap(data.groupby(['Type 1', 'Type 2'])[col].mean().unstack(),
                    linewidths=0.5, cmap='coolwarm', ax=axx, square=True)
    axx.set_title(col, fontsize=20)
    axx.set_xticklabels(axx.xaxis.get_majorticklabels(), rotation=45)
    axx.set_xlabel("")
    axx.set_ylabel("")
f.suptitle("Distributions of Columns in Pokemon Data Set", fontsize=20)


Out[139]:
<matplotlib.text.Text at 0x131d7c790>

Now, let's try the melt feature we all know from R's reshape2 package:


In [110]:
pkmn = pd.melt(data,
               id_vars=["Name", "Type 1", "Type 2"],
               value_vars = ['HP', 'Attack', 'Defense', 'Sp. Atk', 'Sp. Def', 'Speed'],
               var_name="Stat")

In [111]:
pkmn.sample(20)


Out[111]:
Name Type 1 Type 2 Stat value
2850 Luxray Electric Type not set Sp. Atk 95
1709 Electrode Electric Type not set Defense 70
3787 Swoobat Psychic Flying Sp. Def 55
1805 Aipom Normal Type not set Defense 55
1197 GlalieMega Glalie Ice Type not set Attack 120
1734 Electabuzz Electric Type not set Defense 57
1944 Roselia Grass Poison Defense 45
2064 Pachirisu Electric Type not set Defense 70
2107 Lumineon Water Type not set Defense 76
510 Abomasnow Grass Ice HP 90
3163 Clawitzer Water Type not set Sp. Atk 120
4224 SteelixMega Steelix Steel Ground Speed 30
2586 Pichu Electric Type not set Sp. Atk 35
1699 Gastly Ghost Poison Defense 30
573 Simisear Fire Type not set HP 75
3793 Gurdurr Fighting Type not set Sp. Def 50
3624 GroudonPrimal Groudon Ground Fire Sp. Def 90
3173 Carbink Rock Fairy Sp. Atk 50
2702 Pelipper Water Flying Sp. Atk 85
0 Bulbasaur Grass Poison HP 45

And one fancy, so-called "swarmplot":


In [137]:
plt.figure(figsize=(12,10))
plt.ylim(0, 275)
sns.swarmplot(x="Stat", y="value", data=pkmn, hue="Type 1", split=True, size=7)
plt.legend(bbox_to_anchor=(1, 1), loc=2, borderaxespad=0.)
plt.gcf().savefig("figures/example_swarmplot.svg")