Data BootCamp Project


In [ ]:
import pandas as pd #PandasPandas
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
print('PandaPandaPanda ', pd.__version__)
df=pd.read_csv('NHLQUANT.csv')

Who has Grit?

Hockey has always been a eloquent yet brutal sport, in this analysis I'm finding the player that embodies the brutality.


In [ ]:
plt.plot(df.index,df['Grit'])
AHHH

The graph above is the simplest display of my data. Out of roughly 900 NHL players, only a few are recognizeable from the mass.


In [ ]:
df.head(10)

This is the way my quantitative data looks. Most of the column headers are self explanatory, but i'll go into further detail later.


In [ ]:
df.mean()

Above is the quantitative means of the data i've acquired. Grit is a weighted compilation of penalty minutes, hits, blocked shots, and fights (making it somewhat subjective).


In [ ]:
pd.to_numeric(df, errors='ignore')

In [ ]:
y=df["Age"]

In [ ]:
z=df["Grit"]

In [ ]:
plt.plot(y,z)

In [ ]:
df['Grit']>130

In [ ]:
df.ix[df['Grit']>130]

Since i'm primarily interested in players with the most Grit, i'm going to limit my searches to a higher percentile.


In [ ]:
df.ix[df['Grit']>300]

In [ ]:
df.ix[df['Grit']>400]

In [ ]:
Best=df.ix[df['Grit']>400]

In [ ]:
Best.sort("Age").plot.barh('Age',"Grit")

Of the original 900, these are the 10 players with the most Grit.


In [ ]:
QL=pd.read_csv("NHLQUAL.csv")

In [ ]:
QL.head(5)

Above is how my qualitative data is structured. I've seperated the datasets for ease of manipulation.


In [ ]:
print(QL.at[61, "First Name"]+QL.at[61, 'Last Name'],QL.at[94, 'First Name']+QL.at[94, 'Last Name'],
QL.at[712, "First Name"]+QL.at[712, "Last Name"],QL.at[209, 'First Name']+QL.at[209, 'Last Name'],QL.at[306, "First Name"]+QL.at[306, 'Last Name'],QL.at[497, 'First Name']+QL.at[497, 'Last Name'],QL.at[524, 'First Name']+QL.at[524, 'Last Name'],QL.at[565, 'First Name']+QL.at[565, 'Last Name'],QL.at[641, 'First Name']+QL.at[641, 'Last Name'],QL.at[877, 'First Name']+QL.at[877, 'Last Name'])

Above are the hardiest players in the NHL, but how do they perform?


In [ ]:
Best.sort("Age").plot.barh('Age',"HitF")

In [ ]:
Best.sort("Age").plot.barh('Age',"HitA")

The two graphs above represent hits given and hits received respectively.


In [ ]:
Best.plot(Best.index ,'GP')
plt.ylim([60,85])

The above graph reflects the amount of games played during the season. The x-axis is simply the index value attributed to the player.


In [ ]:
fig, ax=plt.subplots(nrows=2, ncols=1, sharex=True, sharey=True)
Best['G'].plot(ax=ax[0],color='green')
Best['A'].plot(ax=ax[1],color='red')

These two graphs represent goals scored and assists respectively. As you can see, the player with index number of roughly 100 clearly has more goals and assists. Of our Grittiest players, Dustin Byfuglien has the highest offensive performance.Yet we know from a previous graph that the player with the most grit isn't Dustin because the player with the most grit is age 25. By cross referencing the datasets, we see that Radko Gudas holds the title as the 2015-2016 player with the most grit.

Data originally taken from http://www.hockeyabstract.com/testimonials


In [ ]: