Stat Category | Point Value |
---|---|
Rushing Yards | 1 point for every 10 yards |
Rushing TDs | 6 points |
Receiving Yards | 1 point for every 10 yards |
Receiving TDs | 6 points |
Fumbles Lost | -2 points |
In [1]:
%matplotlib inline
import pandas as pd
import matplotlib as mp
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
rb_games = pd.read_csv('rb_games.csv')
rb_games.columns.values
Out[1]:
In [2]:
rb_games['Fantasy Points'] = ((rb_games['Rush Yds'] + rb_games['Rec Yds']) / 10) + ((rb_games['Rush TD'] + rb_games['Rec TD']) *6)
rb_fantasy = rb_games[['Name','Career Year', 'Year', 'Game Count', 'Career Games', 'Date', 'Rec Rec', 'Rec Yds', 'Rec TD', 'Rush Att', 'Rush Yds', 'Rush TD', 'Fantasy Points']]
rb_fantasy.head(10)
Out[2]:
In [37]:
x = rb_fantasy['Fantasy Points']
sns.set_context('poster')
sns.set_style("ticks")
g=sns.distplot(x,
kde_kws={"color":"g","lw":4,"label":"KDE Estim","alpha":0.5},
hist_kws={"color":"r","alpha":0.3,"label":"Freq"})
# remove the top and right line in graph
sns.despine()
# Set the size of the graph from here
g.figure.set_size_inches(12,7)
# Set the Title of the graph from here
g.axes.set_title('RB Fantasy Point Distribution', fontsize=34,color="b",alpha=0.3)
# Set the xlabel of the graph from here
g.set_xlabel("Fantasy Points",size = 67,color="g",alpha=0.5)
# Set the ylabel of the graph from here
g.set_ylabel("Density",size = 67,color="r",alpha=0.5)
# Set the ticklabel size and color of the graph from here
g.tick_params(labelsize=14,labelcolor="black")
In [24]:
rush_att = rb_fantasy['Rush Att'].mean()
print('Shifting data to only include Fantasy Points when greater than %d average rushing attempts' %(rush_att))
rb_mid_level = rb_fantasy.loc[rb_fantasy['Rush Att'] > rush_att]
x = rb_mid_level['Fantasy Points']
sns.set_context('poster')
sns.set_style("ticks")
g=sns.distplot(x,
kde_kws={"color":"g","lw":4,"label":"KDE Estim","alpha":0.5},
hist_kws={"color":"r","alpha":0.3,"label":"Freq"})
# remove the top and right line in graph
sns.despine()
# Set the size of the graph from here
g.figure.set_size_inches(12,7)
# Set the Title of the graph from here
g.axes.set_title('RB Fantasy Point Distribution \n Shifted by Averge Rushes', fontsize=34,color="b",alpha=0.3)
# Set the xlabel of the graph from here
g.set_xlabel("Fantasy Points",size = 67,color="g",alpha=0.5)
# Set the ylabel of the graph from here
g.set_ylabel("Density",size = 67,color="r",alpha=0.5)
# Set the ticklabel size and color of the graph from here
g.tick_params(labelsize=14,labelcolor="black")
In [25]:
rush_att = rb_mid_level['Rush Att'].mean()
print('Shifting data to only include Fantasy Points when greater than %d average rushing attempts' %(rush_att))
rb_high_level = rb_mid_level.loc[rb_fantasy['Rush Att'] > rush_att]
x = rb_high_level['Fantasy Points']
sns.set_context('poster')
sns.set_style("ticks")
g=sns.distplot(x,
kde_kws={"color":"g","lw":4,"label":"KDE Estim","alpha":0.5},
hist_kws={"color":"r","alpha":0.3,"label":"Freq"})
# remove the top and right line in graph
sns.despine()
# Set the size of the graph from here
g.figure.set_size_inches(12,7)
# Set the Title of the graph from here
g.axes.set_title('RB Fantasy Point Distribution\n Shifted by Averge Rushes', fontsize=34,color="b",alpha=0.3)
# Set the xlabel of the graph from here
g.set_xlabel("Fantasy Points",size = 67,color="g",alpha=0.5)
# Set the ylabel of the graph from here
g.set_ylabel("Density",size = 67,color="r",alpha=0.5)
# Set the ticklabel size and color of the graph from here
g.tick_params(labelsize=14,labelcolor="black")
By increasing the shift of data so that only data for running backs that averaged 16 or more rushing attempts, the data is more normalized. The thought is that this is an indicator that very sparsely used players who would not generate many fantasy points are being eliminated and a truer view of what a running back can contribute is being seen.
In [26]:
low_rush_att = rb_high_level['Rush Att'].mean()
print('Shifting data to only include Fantasy Points when greater than %d average rushing attempts' %(low_rush_att))
rb_higher_level = rb_high_level.loc[rb_fantasy['Rush Att'] > low_rush_att]
x = rb_higher_level['Fantasy Points']
sns.set_context('poster')
sns.set_style("ticks")
g=sns.distplot(x,
kde_kws={"color":"g","lw":4,"label":"KDE Estim","alpha":0.5},
hist_kws={"color":"r","alpha":0.3,"label":"Freq"})
# remove the top and right line in graph
sns.despine()
# Set the size of the graph from here
g.figure.set_size_inches(12,7)
# Set the Title of the graph from here
g.axes.set_title('RB Fantasy Point Distribution by Rushes\n Shifted by Averge Rushes', fontsize=34,color="b",alpha=0.3)
# Set the xlabel of the graph from here
g.set_xlabel("Fantasy Points",size = 67,color="g",alpha=0.5)
# Set the ylabel of the graph from here
g.set_ylabel("Density",size = 67,color="r",alpha=0.5)
# Set the ticklabel size and color of the graph from here
g.tick_params(labelsize=14,labelcolor="black")
Again, filtering the data so that only fantasy points for running backs that averaged over 21 carries a game created an even more normalized distribution of data. One data point to consider here is that averaging 21 carries a game for a 16 game season would result in 336 carries. Anecdototally, carrying more than 300 per season is generally considered a warning flag that a player will have a shorter career.
In [28]:
rush_att = rb_higher_level['Rush Att'].mean()
print('Shifting data to only include Fantasy Points when greater than %d average rushing attempts' %(rush_att))
rb_highest_level = rb_higher_level.loc[rb_fantasy['Rush Att'] > rush_att]
x = rb_highest_level['Fantasy Points']
sns.set_context('poster')
sns.set_style("ticks")
g=sns.distplot(x,
kde_kws={"color":"g","lw":4,"label":"KDE Estim","alpha":0.5},
hist_kws={"color":"r","alpha":0.3,"label":"Freq"})
# remove the top and right line in graph
sns.despine()
# Set the size of the graph from here
g.figure.set_size_inches(12,7)
# Set the Title of the graph from here
g.axes.set_title('RB Fantasy Point Distribution by Rushes\n Shifted by Averge Rushes', fontsize=34,color="b",alpha=0.3)
# Set the xlabel of the graph from here
g.set_xlabel("Fantasy Points",size = 67,color="g",alpha=0.5)
# Set the ylabel of the graph from here
g.set_ylabel("Density",size = 67,color="r",alpha=0.5)
# Set the ticklabel size and color of the graph from here
g.tick_params(labelsize=14,labelcolor="black")
In [3]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
print(len(rb_fantasy))
yearly_fantasy_points = rb_fantasy.groupby(['Career Year'], as_index=False).mean()
yearly_fantasy_points[['Career Year', 'Rush Att', 'Rush Yds', 'Rush TD', 'Rec Yds', 'Rec TD', 'Fantasy Points']]
Out[3]:
In [29]:
color = ['red']
ax = sns.barplot(x=yearly_fantasy_points['Career Year'], y=yearly_fantasy_points['Fantasy Points'], palette=color)
sns.despine()
# Set the size of the graph from here
ax.figure.set_size_inches(12,7)
ax.axes.set_title('Fantasy Points by Year',
fontsize=34,color="b",alpha=0.3)
ax.set_xlabel("Career Year",size = 67,color="g",alpha=0.5)
g.set_ylabel("Fantasy Points",size = 67,color="r",alpha=0.5)
g.tick_params(labelsize=14,labelcolor="black")
An unfiltered view of fantasy points by year for a running back show them to be far less important than quarterbacks in terms of the fantasy points they can generate. This does show a trend of improvement over the first 6 years of a career, following by a 3 year plateau and then a decline in points after that. The sharpest increase in performance is between years 1 and 2.
In [35]:
color = ['blue']
ax = sns.barplot(x=yearly_fantasy_points['Career Year'], y=yearly_fantasy_points['Rush Yds'], palette=color)
sns.despine()
# Set the size of the graph from here
ax.figure.set_size_inches(12,7)
ax.axes.set_title('Rush Yards by Year',
fontsize=34,color="b",alpha=0.3)
ax.set_xlabel("Career Year",size = 67,color="g",alpha=0.5)
g.set_ylabel("Rush Yds",size = 67,color="r",alpha=0.5)
g.tick_params(labelsize=14,labelcolor="black")
Similar to Fantasy Point production, the sharpest increase in rushing yardage happens during the between the first and second year. After that there is a slight to year 6, a bit of a plateau between years 6 and 8, and then a sharp decrease after 8 years with an anomoly in year 12. Need to investigate to see if this is an outlier performer as opposed to a general trend.
In [39]:
color = ['Green']
ax = sns.barplot(x=yearly_fantasy_points['Career Year'], y=yearly_fantasy_points['Rec Yds'], palette=color)
sns.despine()
# Set the size of the graph from here
ax.figure.set_size_inches(12,7)
ax.axes.set_title('Rec Yards by Year',
fontsize=34,color="b",alpha=0.3)
ax.set_xlabel("Career Year",size = 67,color="g",alpha=0.5)
g.set_ylabel("Rec Yds",size = 67,color="r",alpha=0.5)
g.tick_params(labelsize=14,labelcolor="black")
The number of yards generated by running backs by receiving the football are not necessarily significant, but there are a couple interesting traits. The initial growth period extends from year 1 through 3, which is longer than rushing increases, aand the plateau period seems to be years 3 through 8. After a steep decline after year 8, there is almost a bit of a secondary growth period. This might be interesting when coupled with the general decline in rushing yardage after year 8.