An Introduction to py-Goldsberry

py-Goldsberry is a Python package that makes it easy to interface with the http://stats.nba.com and retrieve the data in a more analyzable format.

This is the first in a series of tutorials that walk through the different modules of the packages and how to use each to get different types of data.

If you've made it this far, you're probably less interested in reading about the package and more interested in actually using it.

Installation

If you don't have the package installed, use pip install get the latest version

pip install py-goldsberry
pip install --upgrade py-goldsberry

When you have py-goldsberry installed, you can load the package and check the version number


In [1]:
import goldsberry
import pandas as pd
goldsberry.__version__


Out[1]:
'1.0.1'

py-goldsberry is designed to work in conjuntion with Pandas. Each function within the package returns data in a format that is easily converted to a Pandas DataFrame.

To get started, let's get a list of all of the players who were on an NBA roster during the 2015-16 season

PlayerIDs

Currently, the PlayerList() function defaults to the current season. We start by creating an object, players, that we will use to scrape player data.


In [2]:
players = goldsberry.PlayerList()
players2015 = pd.DataFrame(players.players())
players2015.head()


Out[2]:
DISPLAY_FIRST_LAST DISPLAY_LAST_COMMA_FIRST FROM_YEAR GAMES_PLAYED_FLAG PERSON_ID PLAYERCODE ROSTERSTATUS TEAM_ABBREVIATION TEAM_CITY TEAM_CODE TEAM_ID TEAM_NAME TO_YEAR
0 Quincy Acy Acy, Quincy 2012 Y 203112 quincy_acy 1 SAC Sacramento kings 1610612758 Kings 2015
1 Jordan Adams Adams, Jordan 2014 Y 203919 jordan_adams 1 MEM Memphis grizzlies 1610612763 Grizzlies 2015
2 Steven Adams Adams, Steven 2013 Y 203500 steven_adams 1 OKC Oklahoma City thunder 1610612760 Thunder 2015
3 Arron Afflalo Afflalo, Arron 2007 Y 201167 arron_afflalo 1 NYK New York knicks 1610612752 Knicks 2015
4 Alexis Ajinca Ajinca, Alexis 2008 Y 201582 alexis_ajinca 1 NOP New Orleans pelicans 1610612740 Pelicans 2015

We can manipulate the players object to get data from different seasons by changing the API parameters and then re-running the query of the website. For example, if we want to get a list of players who were on an NBA roster during the 1990-91 season, we set the Season parameter to 1990-91 using the .get_new_data() method of the players class as follows.


In [3]:
players.get_new_data(Season = '1990-91')

Once we get the raw data from the website, we need to save it as a dataframe to a new object.


In [4]:
players1990 = pd.DataFrame(players.players())
players1990.head()


Out[4]:
DISPLAY_FIRST_LAST DISPLAY_LAST_COMMA_FIRST FROM_YEAR GAMES_PLAYED_FLAG PERSON_ID PLAYERCODE ROSTERSTATUS TEAM_ABBREVIATION TEAM_CITY TEAM_CODE TEAM_ID TEAM_NAME TO_YEAR
0 Mark Alarie Alarie, Mark 1986 Y 76019 HISTADD_mark_alarie 1 WAS Washington wizards 1610612764 Bullets 1990
1 Steve Alford Alford, Steve 1987 Y 76024 HISTADD_steve_alford 1 DAL Dallas mavericks 1610612742 Mavericks 1990
2 Cedric Ball Ball, Cedric 1990 Y 76090 HISTADD_cedric_ball 1 LAC Los Angeles clippers 1610612746 Clippers 1990
3 Ken Bannister Bannister, Ken 1984 Y 76094 HISTADD_ken_bannister 1 LAC Los Angeles clippers 1610612746 Clippers 1990
4 Greg Butler Butler, Greg 1988 Y 76320 HISTADD_gregory_butler 1 LAC Los Angeles clippers 1610612746 Clippers 1990

Each class in py-Goldsberry works in a similar fashion. When instantiating each class, the class makes some assumptions about the parameters to use to query the NBA website and executes the query. If you want to change the query after instantiation, you can change the query parameters and then re-query the database with .get_new_data(). Under the hood, the .get_new_data() method takes any number of keyword arguments that it then translates to api parameters. As a sanity check, it will raise an exception if you try to set a parameter that the specific query does not take.

Each class takes a specific set of parameters. py-Goldsberry is built to include a list of each parameter as well as a default value. I'm working on a dictionary of parameters and possible values each can take. Look for it to be posted in the near future. Until then, you can access the raw parameter dictionary by calling the .get_parameter_items() method of each class. This gives you the possible values that the query can take.

As you saw above, you can pass in keyword arguments with the keyword being the parameter name and the argument being the desired value to change the default value of the paramters.


In [5]:
players.get_parameter_items()


Out[5]:
{'IsOnlyCurrentSeason': '1', 'LeagueID': '00', 'Season': '1990-91'}

In the case of the PlayersList() class, you can get a historical list of players by changing the value of 'IsOnlyCurrentSeason' from 1 to 0.


In [6]:
players.get_new_data(IsOnlyCurrentSeason = 0)
playersAllTime = pd.DataFrame(players.players())
playersAllTime.head()


Out[6]:
DISPLAY_FIRST_LAST DISPLAY_LAST_COMMA_FIRST FROM_YEAR GAMES_PLAYED_FLAG PERSON_ID PLAYERCODE ROSTERSTATUS TEAM_ABBREVIATION TEAM_CITY TEAM_CODE TEAM_ID TEAM_NAME TO_YEAR
0 Alaa Abdelnaby Abdelnaby, Alaa 1990 Y 76001 HISTADD_alaa_abdelnaby 1 POR Portland blazers 1610612757 Trail Blazers 1994
1 Zaid Abdul-Aziz Abdul-Aziz, Zaid 1968 Y 76002 HISTADD_zaid_abdul-aziz 0 0 1977
2 Kareem Abdul-Jabbar Abdul-Jabbar, Kareem 1969 Y 76003 HISTADD_kareem_abdul-jabbar 0 0 1988
3 Mahmoud Abdul-Rauf Abdul-Rauf, Mahmoud 1990 Y 51 HISTADD_mahmoud_abdul-rauf 1 DEN Denver nuggets 1610612743 Nuggets 2000
4 Tariq Abdul-Wahad Abdul-Wahad, Tariq 1997 Y 1505 tariq_abdul-wahad 0 0 2003

By default, Goldsberry is set to pull data from the current year. If you are interested in alternative data from the get-go, you can set the default parameters do your desired values upon insantiation of the class. Let's checkout an example of getting the All-Time player list from a brand new object


In [7]:
new_playersAllTime = pd.DataFrame(goldsberry.PlayerList(IsOnlyCurrentSeason=0).players())
new_playersAllTime.head()


Out[7]:
DISPLAY_FIRST_LAST DISPLAY_LAST_COMMA_FIRST FROM_YEAR GAMES_PLAYED_FLAG PERSON_ID PLAYERCODE ROSTERSTATUS TEAM_ABBREVIATION TEAM_CITY TEAM_CODE TEAM_ID TEAM_NAME TO_YEAR
0 Alaa Abdelnaby Abdelnaby, Alaa 1990 Y 76001 HISTADD_alaa_abdelnaby 0 0 1994
1 Zaid Abdul-Aziz Abdul-Aziz, Zaid 1968 Y 76002 HISTADD_zaid_abdul-aziz 0 0 1977
2 Kareem Abdul-Jabbar Abdul-Jabbar, Kareem 1969 Y 76003 HISTADD_kareem_abdul-jabbar 0 0 1988
3 Mahmoud Abdul-Rauf Abdul-Rauf, Mahmoud 1990 Y 51 HISTADD_mahmoud_abdul-rauf 0 0 2000
4 Tariq Abdul-Wahad Abdul-Wahad, Tariq 1997 Y 1505 tariq_abdul-wahad 0 0 2003

In [8]:
playersAllTime.equals(new_playersAllTime)


Out[8]:
False

Well, it looks like these data frames aren't quite identical. Why is that?

Take a look at the ROSTERSTATUS column. When we first asked for the all time players, remember we had set the base year to 1990-91? Alaa Abdelnaby was actually on a roster during that season (Portland to be specific) so he has a value of 1 in the ROSTERSTATUS column. Since he was not in the league during the current season, he has a 0 in that column for the second pull. Let's compare just the names and see if we get an exact match. That will further reinforce that we have the same data, but we are looking at it from diffent points in time.


In [9]:
playersAllTime.loc[:, 'DISPLAY_FIRST_LAST'].equals(new_playersAllTime.loc[:, 'DISPLAY_FIRST_LAST'])


Out[9]:
True

Success!

This notebook outlines the general work flow for working with py-Goldsberry. I'll post additional workbooks outline additional data pulls and illustrating some of the other features of the package and possibilities with the data.


In [ ]: