Which variable leads to more game wins? Team Assists or Three Pointers?
Spring 2017 Data Bootcamp
By: Leland Sutton, Danielle Bennett and Michael Hou
We initially wanted to look at the intersection between national basketball team profitability and national basketball team wins/losses. We were looking to see if the number of wins in a season would affect the amount of money the team brought in via ticket and merchandise sales, but after looking at the website we found that housed the financial information regarding the inflow of revenue and trying to scrape data from it, we decided to change course as the website was written in Java.
Below is the code for the scraped data we are using to compare the different national basketball team’s success over an eleven year period, which is from the 2006-2007 season to the 2016-2017 season. This is defined by the percentage of wins, taking into account the irregularity of the 2011 NBA lockout. Now, we are looking to compare the “success” of national basketball teams by examining the number of three point shots made (3PA) as well as assists (AST) completed. We are now looking to compare the “success” of national basketball teams, which we defined as how many games a team wins, to the number of three point shots made and assists completed. We expect to see an increase in the 3PA for “successful” teams in recent years, but do not expect to see big variations in amount of assists relative to the time frame.
In [24]:
import pandas as pd # data package
import matplotlib.pyplot as plt # graphics
import datetime as dt # date tools, used to note current date
import seaborn as sns
from IPython.display import display, HTML
pd.options.mode.chained_assignment = None # default='warn'
%matplotlib inline
#Abbreviations for teams we are using; Golden State Warriors, Los Angeles Lakers, Los Angeles Clippers, Phoenix Suns,
#Sacramento Kings
teams = ["GSW", "LAL", "LAC", "PHO", "SAC"]
urls = []
for team in teams:
urls.append("http://www.basketball-reference.com/teams/" + team + "/stats_basic_totals.html")
In [121]:
from bs4 import BeautifulSoup
from requests import get
dfs = []
for url in urls:
team_data = get(url)
soup = BeautifulSoup(team_data.content, 'html.parser')
# Load site, find table
table = soup.findAll('table')[0]
# Find columns
columns = table.findAll('thead')[0].findAll('th')
headers = []
# Only use columns that have a header longer than 0
for column in columns:
header = column.getText().strip()
if (len(header) > 0):
headers.append(header)
# Finds the rows that we want for the project
rows = table.findAll('tbody')[0].findAll('tr')
team = []
for row in rows[0:11]:
season = row.findAll('th')[0].getText()
cells = row.findAll('td')
data = [season]
# Extracts the specific data from each row
for cell in cells:
c = cell.getText().strip()
if (len(c) > 0):
data.append(c)
team.append(data)
df = pd.DataFrame(columns=headers, data=team)
dfs.append(df)
print('Scraped ' + df['Tm'][0] + ' data')
print('Headers: ' + str(list(dfs[0])))
In [122]:
# Only keep the data that we need
scrubbed = []
for df in dfs:
df = df[['Season', 'Tm', 'W', 'L', '3PA', '3P%', 'AST']]
df['Season'] = df['Season'].str[:4]
df['W'] = df['W'].astype(float)
df['L'] = df['L'].astype(float)
df['WLP'] = (df['W'] / (df['W'] + df['L']) * 100).astype(float)
#here we added a column for the Win/Loss percentage to account for the lockout.
print(df)
scrubbed.append(df)
We wanted to compare both three-pointers attempted (3PA) and 3P% to win rates over a eleven year period because we wanted to see how significant the effects of the rising popularity in three-pointers are on a team's success rate.
In the past few years, especially with the rising super-stardom of Stephen Curry and the coaching style of Steve Kerr, many teams have been focusing on improving three-point shooting. We gathered data from the Pacific Division because it has one of the largest ranges of success in the league, from the Golden State Warriors, who are peaking in terms of team “success” to the Sacramento Kings and Phoenix Suns, who are on the complete opposite end of the spectrum. The success in the teams themselves have also varied greatly in years, yet almost all five teams have shown an increase in 3PA.
In [86]:
# PLOT...LY
from plotly.offline import iplot, iplot_mpl # plotting functions
import plotly.graph_objs as go # ditto
import plotly # just to print version and init notebook
import plotly.plotly as py
import cufflinks as cf # gives us df.iplot that feels like df.plot
cf.set_config_file(offline=True, offline_show_link=False)
# these lines make our graphics show up in the notebook
%matplotlib inline
plotly.offline.init_notebook_mode(connected=True)
#we took some of this code from the Plotly website that we found recently
#comparing 3PA to W/L%
for df in scrubbed:
data = []
data.append(go.Scatter(
x = df['Season'],
y = df['WLP'],
name= 'Win Loss Percentage'
))
data.append(go.Scatter(
x = df['Season'],
y = df['3PA'],
name= '3 Point Attempts'
))
layout = dict(title= df['Tm'][0],xaxis = dict(title = 'Years'), yaxis = dict(title ='Percentage'))
fig = dict(data=data, layout=layout)
iplot(fig, filename='basic-line')
In [87]:
#next we compare 3P% to W/L%
for df in scrubbed:
data = []
data.append(go.Scatter(
x = df['Season'],
y = df['WLP'],
name= 'Win Loss Percentage'
))
data.append(go.Scatter(
x = df['Season'],
y = df['3P%'],
name= '3 Point Percentage'
))
layout = dict(title= df['Tm'][0],xaxis = dict(title = 'Years'), yaxis = dict(title ='Percentage'))
fig = dict(data=data, layout=layout)
iplot(fig, filename='basic-line')
In [ ]:
In [124]:
for df in scrubbed:
data = []
data.append(go.Bar(
x = df['Season'],
y = df['AST'],
name= '# of Assists'
))
data = []
data.append(go.Bar(
x = df['Season'],
y = df['3P%'],
name= '3 Point Percentage'
))
data = []
data.append(go.Bar(
x = df['Season'],
y = df['WLP'],
name= 'Win Loss Percentage'
))
data.append(go.Bar(
x = df['Season'],
y = df['3PA'],
name= '3 Point Attempts'
))
layout = dict(title= 'Comparison for '+ df['Tm'][0],xaxis = dict(title = 'Years'))
fig = dict(data=data, layout=layout)
iplot(fig, filename='basic-line')
Above we looked at 3 point attempts and then we look at 3 point percentage. We looked at both variables because we figured that though the percentage of 3 point shots made would be indicative of the strength of the NBA team and potentially directly correlated to the "success" of the basketball team, we believe that 3 point attempts would also increase over time. As there are more attempts made than successful shots, this variable would also have more data points, giving us a clearer picture of the effect 3 point shots have on a team's success.
In [94]:
for df in scrubbed:
data = []
data.append(go.Scatter(
x = df['Season'],
y = df['WLP'],
name= 'Win Loss Percentage'
))
layout = dict(title= df['Tm'][0] + "'s Win Loss Percentage",xaxis = dict(title = 'Years'), yaxis = dict(title ='Percentage'))
fig = dict(data=data, layout=layout)
iplot(fig, filename='basic-line')
We found that while 3PA generally increases across most teams (except the Phoenix Suns), there does not seem to be a clear trend between three point attempts and team success, which was measured by the W/L percentage. Next, we turn to 3P% which we separated into individual graphs in order to compare against Win Loss Percenatage. We did this in order to accurately see the variation in 3P% over time. On average 3P% followed the same curvature as the win/loss percentage. This is evidence of correlation between the 3P% and the win/loss percentage.
We wanted to add another variable, so we looked at assists. We used this csv to upload data from the same 5 teams over the same periiod of time (2006-2016). Below you will find the tables correlating to the scraped data for each of the five teams in the same pacific NBA region.
In [119]:
scrubbed = []
for df in dfs:
df = df[['Season', 'Tm', 'AST']]
df['Season'] = df['Season'].str[:4]
print(df)
scrubbed.append(df)
In [99]:
for df in scrubbed:
data = []
data.append(go.Scatter(
x = df['Season'],
y = df['AST'],
name= 'Assists'
))
layout = dict(title= df['Tm'][0] + "'s No. of Assists",xaxis = dict(title = 'Years'), yaxis = dict(title ='Number'))
fig = dict(data=data, layout=layout)
iplot(fig, filename='basic-line')
To add even more depth to our analysis we wanted to analyze some more assist-related stats. We took four stats from nbaminer.com:
1) Assists/Turnover ratio of each team --- used to measure ball control of a team
2) Opposing Team's Assists/Turnover ratio for each team --- used to measure opposing team's performance
3) Assisted Field Goals Made % of each team --- rough measure for teamwork capabilities
4) Opposing Team's Field Goals Made % for each team --- same, but for opposing teams
We took this data for each team in the NBA over five years so we could see if there was a particular focus in coaching style for each team that lead to varying levels of success, i.e. W/L ratio.
In [100]:
import pandas as pd # data package
import matplotlib.pyplot as plt # graphics
import datetime as dt
import pandas as pd
import matplotlib.pyplot as plt
import datetime as dt
import numpy as np
import seaborn as sns
import statistics
import csv
In [101]:
import requests
In [135]:
url = 'http://www.nbaminer.com/assist-details/'
In [103]:
miner = requests.get(url)
miner
Out[103]:
In [104]:
miner.status_code
Out[104]:
In [105]:
miner.content[:500]
Out[105]:
In [158]:
af = pd.read_csv('NBA_Asst.csv')
In [211]:
af.describe()
Out[211]:
In [160]:
af.drop(af.columns[8:], axis=1, inplace=True)
In [161]:
af.drop(af.index[59:], axis=0)
Out[161]:
In [110]:
af['Opp. Assisted FGM Pct.'] = pd.to_numeric(af['Opp. Assisted FGM Pct.'])
In [111]:
af['Assisted FGM Pct.'] = pd.to_numeric(af['Assisted FGM Pct.'])
In [112]:
af['Opp. Assist/TO Ratio'] = pd.to_numeric(af['Opp. Assist/TO Ratio'])
In [113]:
af['Assist/TO Ratio'] = pd.to_numeric(af['Assist/TO Ratio'])
In [163]:
af['W/L'] = pd.to_numeric(af['W/L'])
In order to group the data recieved by NBAMiner.com by team name we decided to use the groupby function. We were then able to take a wider look at all the assist related attributes of all 32 of the teams in comparison to one another.
In [184]:
afgb = af.groupby('Team')
afgb_atr = afgb['Assist/TO Ratio'].agg([np.sum, np.mean, np.median])
afgb_oatr = afgb['Opp. Assist/TO Ratio'].agg([np.sum, np.mean, np.median])
afgb_afgm = afgb['Assisted FGM Pct.'].agg([np.sum, np.mean, np.median])
afgb_oafgm = afgb['Opp. Assisted FGM Pct.'].agg([np.sum, np.mean, np.median])
afgb_wl = afgb['W/L'].agg([np.sum, np.mean, np.median])
In [191]:
fig, ax = plt.subplots(nrows=2, ncols=1, sharex=True, figsize=(15,10))
fig.suptitle('Assist/TO Ratio compared to W/L', fontsize=18, fontweight='bold')
afgb_atr['mean'].plot(kind='bar', ax=ax[0], color='orchid')
afgb_wl['mean'].plot(kind='bar', ax=ax[1], color='green')
ax[0].set_ylabel('Assist/TO Ratio', fontsize=15)
ax[1].set_ylabel('W/L Ratio', fontsize=15)
Out[191]:
In [204]:
fig, ax = plt.subplots(nrows=2, ncols=1, sharex=True, figsize=(15,10))
fig.suptitle('Opposing Team Assist/TO ratio compared to W/L', fontsize=18, fontweight='bold')
afgb_oatr['mean'].plot(kind='bar', ax=ax[0], color='mediumslateblue')
afgb_wl['mean'].plot(kind='bar', ax=ax[1], color='green')
ax[0].set_ylabel('Opp. Assist/TO Ratio', fontsize=15)
ax[1].set_ylabel('W/L Ratio', fontsize=15)
Out[204]:
In [210]:
fig, ax = plt.subplots(nrows=2, ncols=1, sharex=True, figsize=(15,10))
fig.suptitle('Assisted Field Goals Made % compared to W/L', fontsize=18, fontweight='bold')
afgb_afgm['mean'].plot(kind='bar', ax=ax[0], color='magenta')
afgb_wl['mean'].plot(kind='bar', ax=ax[1], color='green')
ax[0].set_ylabel('Assisted FGM Pct.', fontsize=15)
ax[1].set_ylabel('W/L Ratio', fontsize=15)
Out[210]:
In [214]:
fig, ax = plt.subplots(nrows=2, ncols=1, sharex=True, figsize=(15,10))
fig.suptitle('Opposing Assisted Field Goals Made % compared to W/L', fontsize=18, fontweight='bold')
afgb_oafgm['mean'].plot(kind='bar', ax=ax[0], color='pink')
afgb_wl['mean'].plot(kind='bar', ax=ax[1], color='green')
ax[0].set_ylabel('Opp. Assisted FGM Pct.', fontsize=15)
ax[1].set_ylabel('W/L Ratio', fontsize=15)
Out[214]:
After analyzing these assist-related stats, as well as assists, we found that none of these attributes have a strong connection to a team's success, which is defined by the team's W/L ratio. However, in certain well-performing teams, such as the Golden State Warriors and the San Antonio Spurs, we saw that they were consistently at the top ranks for statistics such as 3PA or 3P%, and ranked lower in tables that measured the opposing team's performance. Attributes such as opp. assist/TO ratio and opp. assisted FGM% measure the opposing team's performance, so it is actually good these statistics are low.
While three-pointers show a much stronger correlation to W/L ratios, this in-depth look at assists and other statistics that measure teamwork and ball handling in every team, suggest that teams must be strong on multiple fronts in order to be successful.
While three-pointers have undoubtedly contributed to the success of teams in recent years, as well as placed pressure on other teams to increase three-point attempts, this tactic must be supplemented by other statistics and other factors that the data cannot show. Some major "soft" factors include coaching styles,playing environment, and even a team's chemistry.
In the future, we would like to be able to find a way to analyze the relationship between the financial value of a team and its performance. We originally planned on this but the website we found proved to be too challenging to scrape data from as it was written in Java, not HTML/CSS. It would be interesting to see whether or not teams can "buy" their way to success, and maybe see how succes of a team is reflected in the team's financial bottom line.
In [ ]: