An Introduction to py-Goldsberry

py-Goldsberry is a Python package that makes it easy to interface with the http://stats.nba.com and retrieve the data in a more analyzable format.

This is the first in a series of tutorials that walk through the different modules of the packages and how to use each to get different types of data.

If you've made it this far, you're probably less interested in reading about the package and more interested in actually using it.

Installation

If you don't have the package installed, use pip install get the latest version

pip install py-goldsberry
pip install --upgrade py-goldsberry

When you have py-goldsberry installed, you can load the package and check the version number



In [1]:

    
import goldsberry
import pandas as pd
goldsberry.__version__









    Out[1]:





'1.0.1'

py-goldsberry is designed to work in conjuntion with Pandas. Each function within the package returns data in a format that is easily converted to a Pandas DataFrame.

To get started, let's get a list of all of the players who were on an NBA roster during the 2015-16 season

PlayerIDs

Currently, the PlayerList() function defaults to the current season. We start by creating an object, players, that we will use to scrape player data.



In [2]:

    
players = goldsberry.PlayerList()
players2015 = pd.DataFrame(players.players())
players2015.head()









    Out[2]:






  
    
      
      DISPLAY_FIRST_LAST
      DISPLAY_LAST_COMMA_FIRST
      FROM_YEAR
      GAMES_PLAYED_FLAG
      PERSON_ID
      PLAYERCODE
      ROSTERSTATUS
      TEAM_ABBREVIATION
      TEAM_CITY
      TEAM_CODE
      TEAM_ID
      TEAM_NAME
      TO_YEAR
    
  
  
    
      0
      Quincy Acy
      Acy, Quincy
      2012
      Y
      203112
      quincy_acy
      1
      SAC
      Sacramento
      kings
      1610612758
      Kings
      2015
    
    
      1
      Jordan Adams
      Adams, Jordan
      2014
      Y
      203919
      jordan_adams
      1
      MEM
      Memphis
      grizzlies
      1610612763
      Grizzlies
      2015
    
    
      2
      Steven Adams
      Adams, Steven
      2013
      Y
      203500
      steven_adams
      1
      OKC
      Oklahoma City
      thunder
      1610612760
      Thunder
      2015
    
    
      3
      Arron Afflalo
      Afflalo, Arron
      2007
      Y
      201167
      arron_afflalo
      1
      NYK
      New York
      knicks
      1610612752
      Knicks
      2015
    
    
      4
      Alexis Ajinca
      Ajinca, Alexis
      2008
      Y
      201582
      alexis_ajinca
      1
      NOP
      New Orleans
      pelicans
      1610612740
      Pelicans
      2015

We can manipulate the players object to get data from different seasons by changing the API parameters and then re-running the query of the website. For example, if we want to get a list of players who were on an NBA roster during the 1990-91 season, we set the Season parameter to 1990-91 using the .get_new_data() method of the players class as follows.



In [3]:

    
players.get_new_data(Season = '1990-91')

Once we get the raw data from the website, we need to save it as a dataframe to a new object.



In [4]:

    
players1990 = pd.DataFrame(players.players())
players1990.head()









    Out[4]:






  
    
      
      DISPLAY_FIRST_LAST
      DISPLAY_LAST_COMMA_FIRST
      FROM_YEAR
      GAMES_PLAYED_FLAG
      PERSON_ID
      PLAYERCODE
      ROSTERSTATUS
      TEAM_ABBREVIATION
      TEAM_CITY
      TEAM_CODE
      TEAM_ID
      TEAM_NAME
      TO_YEAR
    
  
  
    
      0
      Mark Alarie
      Alarie, Mark
      1986
      Y
      76019
      HISTADD_mark_alarie
      1
      WAS
      Washington
      wizards
      1610612764
      Bullets
      1990
    
    
      1
      Steve Alford
      Alford, Steve
      1987
      Y
      76024
      HISTADD_steve_alford
      1
      DAL
      Dallas
      mavericks
      1610612742
      Mavericks
      1990
    
    
      2
      Cedric Ball
      Ball, Cedric
      1990
      Y
      76090
      HISTADD_cedric_ball
      1
      LAC
      Los Angeles
      clippers
      1610612746
      Clippers
      1990
    
    
      3
      Ken Bannister
      Bannister, Ken
      1984
      Y
      76094
      HISTADD_ken_bannister
      1
      LAC
      Los Angeles
      clippers
      1610612746
      Clippers
      1990
    
    
      4
      Greg Butler
      Butler, Greg
      1988
      Y
      76320
      HISTADD_gregory_butler
      1
      LAC
      Los Angeles
      clippers
      1610612746
      Clippers
      1990

Each class in py-Goldsberry works in a similar fashion. When instantiating each class, the class makes some assumptions about the parameters to use to query the NBA website and executes the query. If you want to change the query after instantiation, you can change the query parameters and then re-query the database with .get_new_data(). Under the hood, the .get_new_data() method takes any number of keyword arguments that it then translates to api parameters. As a sanity check, it will raise an exception if you try to set a parameter that the specific query does not take.

Each class takes a specific set of parameters. py-Goldsberry is built to include a list of each parameter as well as a default value. I'm working on a dictionary of parameters and possible values each can take. Look for it to be posted in the near future. Until then, you can access the raw parameter dictionary by calling the .get_parameter_items() method of each class. This gives you the possible values that the query can take.

As you saw above, you can pass in keyword arguments with the keyword being the parameter name and the argument being the desired value to change the default value of the paramters.



In [5]:

    
players.get_parameter_items()









    Out[5]:





{'IsOnlyCurrentSeason': '1', 'LeagueID': '00', 'Season': '1990-91'}

In the case of the PlayersList() class, you can get a historical list of players by changing the value of 'IsOnlyCurrentSeason' from 1 to 0.



In [6]:

    
players.get_new_data(IsOnlyCurrentSeason = 0)
playersAllTime = pd.DataFrame(players.players())
playersAllTime.head()









    Out[6]:






  
    
      
      DISPLAY_FIRST_LAST
      DISPLAY_LAST_COMMA_FIRST
      FROM_YEAR
      GAMES_PLAYED_FLAG
      PERSON_ID
      PLAYERCODE
      ROSTERSTATUS
      TEAM_ABBREVIATION
      TEAM_CITY
      TEAM_CODE
      TEAM_ID
      TEAM_NAME
      TO_YEAR
    
  
  
    
      0
      Alaa Abdelnaby
      Abdelnaby, Alaa
      1990
      Y
      76001
      HISTADD_alaa_abdelnaby
      1
      POR
      Portland
      blazers
      1610612757
      Trail Blazers
      1994
    
    
      1
      Zaid Abdul-Aziz
      Abdul-Aziz, Zaid
      1968
      Y
      76002
      HISTADD_zaid_abdul-aziz
      0
      
      
      
      0
      
      1977
    
    
      2
      Kareem Abdul-Jabbar
      Abdul-Jabbar, Kareem
      1969
      Y
      76003
      HISTADD_kareem_abdul-jabbar
      0
      
      
      
      0
      
      1988
    
    
      3
      Mahmoud Abdul-Rauf
      Abdul-Rauf, Mahmoud
      1990
      Y
      51
      HISTADD_mahmoud_abdul-rauf
      1
      DEN
      Denver
      nuggets
      1610612743
      Nuggets
      2000
    
    
      4
      Tariq Abdul-Wahad
      Abdul-Wahad, Tariq
      1997
      Y
      1505
      tariq_abdul-wahad
      0
      
      
      
      0
      
      2003

By default, Goldsberry is set to pull data from the current year. If you are interested in alternative data from the get-go, you can set the default parameters do your desired values upon insantiation of the class. Let's checkout an example of getting the All-Time player list from a brand new object



In [7]:

    
new_playersAllTime = pd.DataFrame(goldsberry.PlayerList(IsOnlyCurrentSeason=0).players())
new_playersAllTime.head()









    Out[7]:






  
    
      
      DISPLAY_FIRST_LAST
      DISPLAY_LAST_COMMA_FIRST
      FROM_YEAR
      GAMES_PLAYED_FLAG
      PERSON_ID
      PLAYERCODE
      ROSTERSTATUS
      TEAM_ABBREVIATION
      TEAM_CITY
      TEAM_CODE
      TEAM_ID
      TEAM_NAME
      TO_YEAR
    
  
  
    
      0
      Alaa Abdelnaby
      Abdelnaby, Alaa
      1990
      Y
      76001
      HISTADD_alaa_abdelnaby
      0
      
      
      
      0
      
      1994
    
    
      1
      Zaid Abdul-Aziz
      Abdul-Aziz, Zaid
      1968
      Y
      76002
      HISTADD_zaid_abdul-aziz
      0
      
      
      
      0
      
      1977
    
    
      2
      Kareem Abdul-Jabbar
      Abdul-Jabbar, Kareem
      1969
      Y
      76003
      HISTADD_kareem_abdul-jabbar
      0
      
      
      
      0
      
      1988
    
    
      3
      Mahmoud Abdul-Rauf
      Abdul-Rauf, Mahmoud
      1990
      Y
      51
      HISTADD_mahmoud_abdul-rauf
      0
      
      
      
      0
      
      2000
    
    
      4
      Tariq Abdul-Wahad
      Abdul-Wahad, Tariq
      1997
      Y
      1505
      tariq_abdul-wahad
      0
      
      
      
      0
      
      2003



In [8]:

    
playersAllTime.equals(new_playersAllTime)









    Out[8]:





False

Well, it looks like these data frames aren't quite identical. Why is that?

Take a look at the ROSTERSTATUS column. When we first asked for the all time players, remember we had set the base year to 1990-91? Alaa Abdelnaby was actually on a roster during that season (Portland to be specific) so he has a value of 1 in the ROSTERSTATUS column. Since he was not in the league during the current season, he has a 0 in that column for the second pull. Let's compare just the names and see if we get an exact match. That will further reinforce that we have the same data, but we are looking at it from diffent points in time.



In [9]:

    
playersAllTime.loc[:, 'DISPLAY_FIRST_LAST'].equals(new_playersAllTime.loc[:, 'DISPLAY_FIRST_LAST'])









    Out[9]:





True

Success!

This notebook outlines the general work flow for working with py-Goldsberry. I'll post additional workbooks outline additional data pulls and illustrating some of the other features of the package and possibilities with the data.



In [ ]:

	DISPLAY_FIRST_LAST	DISPLAY_LAST_COMMA_FIRST	FROM_YEAR	GAMES_PLAYED_FLAG	PERSON_ID	PLAYERCODE	ROSTERSTATUS	TEAM_ABBREVIATION	TEAM_CITY	TEAM_CODE	TEAM_ID	TEAM_NAME	TO_YEAR
0	Quincy Acy	Acy, Quincy	2012	Y	203112	quincy_acy	1	SAC	Sacramento	kings	1610612758	Kings	2015
1	Jordan Adams	Adams, Jordan	2014	Y	203919	jordan_adams	1	MEM	Memphis	grizzlies	1610612763	Grizzlies	2015
2	Steven Adams	Adams, Steven	2013	Y	203500	steven_adams	1	OKC	Oklahoma City	thunder	1610612760	Thunder	2015
3	Arron Afflalo	Afflalo, Arron	2007	Y	201167	arron_afflalo	1	NYK	New York	knicks	1610612752	Knicks	2015
4	Alexis Ajinca	Ajinca, Alexis	2008	Y	201582	alexis_ajinca	1	NOP	New Orleans	pelicans	1610612740	Pelicans	2015

	DISPLAY_FIRST_LAST	DISPLAY_LAST_COMMA_FIRST	FROM_YEAR	GAMES_PLAYED_FLAG	PERSON_ID	PLAYERCODE	ROSTERSTATUS	TEAM_ABBREVIATION	TEAM_CITY	TEAM_CODE	TEAM_ID	TEAM_NAME	TO_YEAR
0	Mark Alarie	Alarie, Mark	1986	Y	76019	HISTADD_mark_alarie	1	WAS	Washington	wizards	1610612764	Bullets	1990
1	Steve Alford	Alford, Steve	1987	Y	76024	HISTADD_steve_alford	1	DAL	Dallas	mavericks	1610612742	Mavericks	1990
2	Cedric Ball	Ball, Cedric	1990	Y	76090	HISTADD_cedric_ball	1	LAC	Los Angeles	clippers	1610612746	Clippers	1990
3	Ken Bannister	Bannister, Ken	1984	Y	76094	HISTADD_ken_bannister	1	LAC	Los Angeles	clippers	1610612746	Clippers	1990
4	Greg Butler	Butler, Greg	1988	Y	76320	HISTADD_gregory_butler	1	LAC	Los Angeles	clippers	1610612746	Clippers	1990

	DISPLAY_FIRST_LAST	DISPLAY_LAST_COMMA_FIRST	FROM_YEAR	GAMES_PLAYED_FLAG	PERSON_ID	PLAYERCODE	ROSTERSTATUS	TEAM_ABBREVIATION	TEAM_CITY	TEAM_CODE	TEAM_ID	TEAM_NAME	TO_YEAR
0	Alaa Abdelnaby	Abdelnaby, Alaa	1990	Y	76001	HISTADD_alaa_abdelnaby	1	POR	Portland	blazers	1610612757	Trail Blazers	1994
1	Zaid Abdul-Aziz	Abdul-Aziz, Zaid	1968	Y	76002	HISTADD_zaid_abdul-aziz	0				0		1977
2	Kareem Abdul-Jabbar	Abdul-Jabbar, Kareem	1969	Y	76003	HISTADD_kareem_abdul-jabbar	0				0		1988
3	Mahmoud Abdul-Rauf	Abdul-Rauf, Mahmoud	1990	Y	51	HISTADD_mahmoud_abdul-rauf	1	DEN	Denver	nuggets	1610612743	Nuggets	2000
4	Tariq Abdul-Wahad	Abdul-Wahad, Tariq	1997	Y	1505	tariq_abdul-wahad	0				0		2003