# The Relationship between Athlete Shape and Running Distance

I'm naively assuming that, when trying to run as fast as possible, there is a fundamental tradeoff:

• on one side, having a lot of strength allows the body to exert more force to move forward;
• on the other hand, being as small as possible means less energy expense to move the said body around.

Stereotypical sprint runners (e.g. Usain Bolt) tend to be muscular. Stereotypical long distance runners (e.g. Hicham El Guerrouj) tend to be small. We'll here try to find the approximate inflexion point where athletes go from the former "shape" to the second body type.

``````

In :

import numpy
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
pd.set_option('display.mpl_style', 'default')
plt.rcParams['figure.figsize'] = (15, 5)
plt.rcParams['font.family'] = 'sans-serif'
import re

``````

## 1. Retrieve an clean up the data

Athlete stats from The Guardian. We're only interested in Athletics here, so drop the other sports/events, and change a few names to get the distance in metres.

``````

In :

df=df[df.Sport=="Athletics"]
df.replace(to_replace={'Event':{r'0,000m' : "000m",
r'km' : "000m",
r'Marathon' : "42195m",
r'4 x 100m' : "100m",
r'4 x 400m' : "400m",
}
}, inplace=True, regex=True)
df=df.reset_index()

``````

Some athletes take part in multiple events; split each event into its own row.

``````

In :

s = pd.DataFrame(df.Event.str.split(',').tolist()).stack()
s.index = s.index.droplevel(-1)
s.name='Event'
df.drop('Event', axis=1, inplace=True)

df=df.join(s)

``````

Drop events that aren't runs (throws and jumps) as well as hurdles. Relays are kept in the dataset.

``````

In :

sports_to_ignore = ['Hammer', 'Shot', 'Hurdles', 'Javelin', 'Vault', 'Decathlon', 'Heptathlon', 'Jump', 'Discus', 'Steeplechase', 'Race Walk']
ignore_pattern = '|'.join(sports_to_ignore)
df=df[~df.Event.str.contains(ignore_pattern, flags=re.IGNORECASE)]
df=df.reset_index()

_=df.Event.groupby(df.Event).count().sort_index().plot(kind="bar", title="Number of Athletes per Event")

``````
``````

``````

## 2. Calculate the Body Mass Index (BMI) of each athlete

The BMI is by no means a perfect descriptor of the athlete's "shape". we can however guess (and confirm below) that sprint runners tend to have a high BMI compared to long-distance runners.

``````

In :

df['Distance']=df.apply(lambda row: int("".join(filter(str.isdigit, row['Event']))),
axis=1)
df['BMI']=df['Weight']/df['Height, cm']*100

``````
``````

Out:

Name
Distance
BMI
Sex

0
Jamale Aarrass
1500
40.641711
M

1
Abdihakem Abdirahman
42195
33.888889
M

2
Dana Abdul Razak
100
33.333333
F

3
Layes Abdullayeva
5000
27.647059
F

4
Endurance Abinuwa
400
33.962264
F

``````

## 3. Visually estimate the inflexion point when being small becomes more advantageous than being strong

``````

In :

fig, axs = plt.subplots(1,1)

mean_m=df[df.Sex=='M'].groupby("Distance").mean()['BMI']
std_m=df[df.Sex=='M'].groupby("Distance").std()['BMI']

mean_f=df[df.Sex=='F'].groupby("Distance").mean()['BMI']
std_f=df[df.Sex=='F'].groupby("Distance").std()['BMI']

mean_m.plot(ax=axs, logx=True, title="BMI vs Distance", marker='o', yerr=std_m)
mean_f.plot(ax=axs, logx=True, marker='o', yerr=std_f)

axs.set_ylim(bottom=20)
_=axs.set_xlim(left=90, right=45000)

``````
``````

C:\perso\Python34\lib\site-packages\matplotlib\collections.py:590: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
if self._edgecolors == str('face'):

``````

What can we conclude?