An Introduction to pandas

Pandas! They are adorable animals. You might think they are the worst animal ever but that is not true. You might sometimes think pandas is the worst library every, and that is only kind of true.

The important thing is use the right tool for the job. pandas is good for some stuff, SQL is good for some stuff, writing raw Python is good for some stuff. You'll figure it out as you go along.

Now let's start coding. Hopefully you did pip install pandas before you started up this notebook.


In [1]:
# import pandas, but call it pd. Why? Because that's What People Do.
import pandas as pd #so that you don't have to type pandas later -- most people use pd instead of pands


/Users/Monica/.virtualenvs/dataanalysis/lib/python3.5/site-packages/matplotlib/__init__.py:1035: UserWarning: Duplicate key in file "/Users/Monica/.matplotlib/matplotlibrc", line #2
  (fname, cnt))

When you import pandas, you use import pandas as pd. That means instead of typing pandas in your code you'll type pd.

You don't have to, but every other person on the planet will be doing it, so you might as well.

Now we're going to read in a file. Our file is called NBA-Census-10.14.2013.csv because we're sports moguls. pandas can read_ different types of files, so try to figure it out by typing pd.read_ and hitting tab for autocomplete.


In [2]:
# We're going to call this df, which means "data frame"
# It isn't in UTF-8 (I saved it from my mac!) so we need to set the encoding
#saved on mac, therefore the encoding needs to be mac_roman!
#it will not open if the encoding is not set :()
df = pd.read_csv('NBA-Census-10.14.2013.csv', encoding='mac_roman')

# encoding, the most common are: mac_roman if saved on a mac, latin-1 if saved on PC or UTF-8
 # 'pd.read_csv?' will give you more info about how read_csv works

A dataframe is basically a spreadsheet, except it lives in the world of Python or the statistical programming language R. They can't call it a spreadsheet because then people would think those programmers used Excel, which would make them boring and normal and they'd have to wear a tie every day.

Selecting rows

Now let's look at our data, since that's what data is for


In [3]:
# Let's look at all of it

print(df)


                   Name  Age           Team  POS   #       2013 $  Ht (In.)  \
0           Gee, Alonzo   26      Cavaliers    F  33   $3,250,000        78   
1       Wallace, Gerald   31        Celtics    F  45  $10,105,855        79   
2          Williams, Mo   30  Trail Blazers    G  25   $2,652,000        73   
3     Gladness, Mickell   27          Magic    C  40     $762,195        83   
4    Jefferson, Richard   33           Jazz    F  44  $11,046,000        79   
5         Hill, Solomon   22         Pacers    F   9   $1,246,680        79   
6       Budinger, Chase   25   Timberwolves    F  10   $5,000,000        79   
7     Williams, Derrick   22   Timberwolves    F   7   $5,016,960        80   
8          Hill, Jordan   26         Lakers  F/C  27   $3,563,600        82   
9        Frye, Channing   30           Suns  F/C   8   $6,500,000        83   
10      Bayless, Jerryd   25      Grizzlies    G   7   $3,135,000        75   
11         Terry, Jason   36           Nets    G  31   $5,625,313        74   
12           Fogg, Kyle   23        Nuggets    G   6          n/a        75   
13      Iguodala, Andre   29       Warriors  G/F   9  $12,868,632        78   
14        Boateng, Eric   27         Lakers    C  12          n/a        82   
15           Diogu, Ike   29         Knicks  F/C  50     $792,377        80   
16          Ayres, Jeff   26          Spurs  F/C  11   $1,750,000        81   
17        Harden, James   24        Rockets    G  13  $13,701,250        77   
18       Felix, Carrick   23      Cavaliers  G/F  30     $510,000        78   
19       Pargo, Jannero   33        Bobcats    G   5     $884,293        73   
20    Beverley, Patrick   25        Rockets    G   2     $788,872        73   
21         Johnson, Joe   32           Nets  G/F   7  $21,466,718        79   
22       Brewer, Ronnie   28        Rockets  G/F  10   $1,186,459        79   
23        Fisher, Derek   39        Thunder    G   6     $884,293        73   
24       Miller, Quincy   20        Nuggets    F  30     $788,872        81   
25          Acy, Quincy   23        Raptors    F   4     $788,872        79   
26         Jones, Perry   22        Thunder    F   3   $1,082,520        83   
27           Udoh, Ekpe   26          Bucks  F/C   5   $4,469,548        82   
28           Clark, Ian   22           Jazz    G  21     $490,180        75   
29      Andersen, Chris   35           Heat  F/C  11   $1,399,507        82   
..                  ...  ...            ...  ...  ..          ...       ...   
498         Paul, Chris   28       Clippers    G   3  $18,668,431        72   
499        Teague, Jeff   25          Hawks    G   0   $8,000,000        74   
500          Smith, Ish   25           Suns    G  30     $951,463        72   
501         Duncan, Tim   37          Spurs  F/C  21  $10,361,446        83   
502      Hawes, Spencer   25          76ers    C   0   $6,500,000        85   
503        Wroten, Tony   20          76ers    G   8   $1,160,040        78   
504        Gaddy, Abdul   21        Bobcats    G  10          n/a        75   
505      Thomas, Isaiah   24          Kings    G  22     $884,293        69   
506      Robinson, Nate   29        Nuggets    G  10   $2,016,000        69   
507      Ross, Terrence   22        Raptors    G  31   $2,678,640        78   
508   Pondexter, Quincy   25      Grizzlies  G/F  20     $225,479        78   
509     Holiday, Justin   24           Jazz  G/F  22     $788,872        78   
510        Baynes, Aron   26          Spurs  F/C  16     $788,872        82   
511      Thompson, Klay   23       Warriors  G/F  11   $2,317,920        79   
512     Lillard, Damian   23  Trail Blazers    G   0   $3,202,920        75   
513      Alexander, Joe   26       Warriors    F  25     $854,389        80   
514       Fischer, D'or   32        Wizards    C  21          n/a        83   
515       Ebanks, Devin   23      Mavericks    F  37     $884,293        81   
516       Johnson, Amir   26        Raptors  F/C  15   $6,500,000        81   
517       Martin, Kevin   30   Timberwolves    G  23   $6,500,000        79   
518       Evans, Jeremy   25           Jazz    F  40   $1,660,257        81   
519       Lee, Courtney   28        Celtics  G/F  11   $5,225,000        77   
520          Mekel, Gal   25      Mavericks    G  33     $490,180        75   
521       Murry, Toure'   23         Knicks  G/F  23     $490,180        77   
522      Stiemsma, Greg   28       Pelicans    C  34   $2,676,000        83   
523          Leuer, Jon   24      Grizzlies    F  30     $900,000        82   
524      Landry, Marcus   27         Lakers    F  14     $788,872        79   
525       Harris, Devin   30      Mavericks    G  20     $854,389        75   
526         West, David   33         Pacers    F  21  $12,000,000        81   
527    Crawford, Jordan   24        Celtics    G  27   $2,162,419        76   

      WT  EXP  1st Year         DOB                School                City  \
0    219    4      2009   5/29/1987               Alabama   Riviera Beach, FL   
1    220   12      2001   7/23/1982               Alabama       Sylacauga, AL   
2    195   10      2003  12/19/1982               Alabama         Jackson, MS   
3    220    2      2011   7/26/1986           Alabama A&M      Birmingham, AL   
4    230   12      2001   6/21/1980               Arizona     Los Angeles, CA   
5    220    0      2013   3/18/1991               Arizona     Los Angeles, CA   
6    218    4      2009   5/22/1988               Arizona       Encinitas, CA   
7    241    2      2011   5/25/1991               Arizona       La Mirada, CA   
8    235    1      2012   7/27/1987               Arizona        Newberry, SC   
9    245    8      2005   5/17/1983               Arizona    White Plains, NY   
10   200    5      2008   8/20/1988               Arizona         Phoenix, AZ   
11   180   14      1999   9/15/1977               Arizona         Seattle, WA   
12   183    0      2013   1/27/1990               Arizona            Brea, CA   
13   207    9      2004   1/28/1984               Arizona     Springfield, IL   
14   257   17      1996  11/20/1985         Arizona State         London, ENG   
15   255    8      2005   11/9/1983         Arizona State         Buffalo, NY   
16   250    4      2009   4/29/1987         Arizona State         Ontario, CA   
17   220    4      2009   8/26/1989         Arizona State     Los Angeles, CA   
18   210    0      2013   8/17/1990         Arizona State        Goodyear, AZ   
19   185   11      2002  10/22/1979              Arkansas         Chicago, IL   
20   185    5      2008   7/12/1988              Arkansas         Chicago, IL   
21   240   12      2001   6/29/1981              Arkansas     Little Rock, AR   
22   235    7      2006   3/20/1985              Arkansas        Portland, OR   
23   210   17      1996    8/9/1974  Arkansas-Little Rock     Little Rock, AR   
24   210    1      2012  11/18/1992                Baylor  North Carolina, IL   
25   225    1      2012   10/6/1990                Baylor           Tyler, TX   
26   235    1      2012   9/24/1991                Baylor       Winnsboro, LA   
27   245    3      2010   5/20/1987                Baylor          Edmond, OK   
28   175    0      2013    3/7/1991               Belmont         Memphis, TN   
29   228   12      2001    7/7/1978         Blinn College      Long Beach, CA   
..   ...  ...       ...         ...                   ...                 ...   
498  175    8      2005    5/6/1985           Wake Forest  Forsyth County, NC   
499  181    4      2009   6/10/1988           Wake Forest    Indianapolis, IN   
500  175    3      2010    7/5/1988           Wake Forest       Charlotte, NC   
501  255   16      1997   4/25/1976           Wake Forest   Christiansted, VI   
502  245    6      2007   4/28/1988            Washington         Seattle, WA   
503  205    1      2012   4/13/1993            Washington          Renton, WA   
504  185    0      2013   1/26/1992            Washington          Tacoma, WA   
505  185    2      2011    2/7/1989            Washington          Tacoma, WA   
506  180    8      2005   5/31/1984            Washington         Seattle, WA   
507  195    1      2012    2/5/1991            Washington        Portland, OR   
508  225    3      2010   3/10/1988            Washington          Fresno, CA   
509  185    0      2013    4/5/1989            Washington   Mission Hills, CA   
510  260    0      2013   12/9/1986      Washington State        Gisborne, NZ   
511  205    2      2011    2/8/1990      Washington State     Los Angeles, CA   
512  195    1      2012   7/15/1990           Weber State         Oakland, CA   
513  230    5      2008  12/26/1986         West Virginia       Kaohsiung, TA   
514  255    0      2013  10/12/1981         West Virginia    Philadelphia, PA   
515  215    3      2010  10/28/1989         West Virginia   New York City, NY   
516  210    8      2005    5/1/1987   Westchester HS (CA)     Los Angeles, CA   
517  185    9      2004    2/1/1983      Western Carolina      Zanesville, OH   
518  194    3      2010  10/24/1987      Western Kentucky        Crossett, AR   
519  200    5      2008   10/3/1985      Western Kentucky    Indianapolis, IN   
520  191    5      2008    3/4/1988         Wichita State         Petah Tikva   
521  195    0      2013   11/8/1989         Wichita State         Houston, TX   
522  260    2      2011   9/26/1985             Wisconsin        Randolph, WI   
523  228    2      2011   5/14/1989             Wisconsin       Long Lake, MN   
524  225   17      1996   11/1/1985             Wisconsin       Milwaukee, WI   
525  192    9      2004   2/27/1983             Wisconsin       Milwaukee, WI   
526  250   10      2003   8/29/1980                Xavier         Teaneck, NJ   
527  195    3      2010  10/23/1988                Xavier         Detroit, MI   

    State (Province, Territory, Etc..)         Country   Race HS Only  
0                              Florida              US  Black      No  
1                              Alabama              US  Black      No  
2                          Mississippi              US  Black      No  
3                              Alabama              US  Black      No  
4                           California              US  Black      No  
5                           California              US  Black      No  
6                           California              US  White      No  
7                           California              US  Black      No  
8                       South Carolina              US  Black      No  
9                             New York              US  Black      No  
10                             Arizona              US  Black      No  
11                          Washington              US  Black      No  
12                          California              US  Black      No  
13                            Illinois              US  Black      No  
14                                 n/a         England  Black      No  
15                            New York              US  Black      No  
16                          California              US  Black      No  
17                          California              US  Black      No  
18                             Arizona              US  Black      No  
19                            Illinois              US  Black      No  
20                            Illinois              US  Black      No  
21                            Arkansas              US  Black      No  
22                              Oregon              US  Black      No  
23                            Arkansas              US  Black      No  
24                            Illinois              US  Black      No  
25                               Texas              US  Black      No  
26                           Louisiana              US  Black      No  
27                            Oklahoma              US  Black      No  
28                           Tennessee              US  Black      No  
29                          California              US  White      No  
..                                 ...             ...    ...     ...  
498                     North Carolina              US  Black      No  
499                            Indiana              US  Black      No  
500                     North Carolina              US  Black      No  
501                     Virgin Islands  Virgin Islands  Black      No  
502                         Washington              US  White      No  
503                         Washington              US  Black      No  
504                         Washington              US  Black      No  
505                         Washington              US  Black      No  
506                         Washington              US  Black      No  
507                             Oregon              US  Black      No  
508                         California              US  Black      No  
509                         California              US  Black      No  
510                                n/a     New Zealand  White      No  
511                         California              US  Mixed      No  
512                         California              US  Black      No  
513                                n/a          Taiwan  White      No  
514                       Pennsylvania              US  Black      No  
515                           New York              US  Black      No  
516                         California              US  Black     Yes  
517                               Ohio              US  Mixed      No  
518                           Arkansas              US  Black      No  
519                            Indiana              US  Black      No  
520                                n/a          Israel  White      No  
521                              Texas              US  Black      No  
522                          Wisconsin              US  White      No  
523                          Minnesota              US  White      No  
524                          Wisconsin              US  Black      No  
525                          Wisconsin              US  Black      No  
526                         New Jersey              US  Black      No  
527                           Michigan              US  Black      No  

[528 rows x 17 columns]

If we scroll we can see all of it. But maybe we don't want to see all of it. Maybe we hate scrolling?


In [4]:
# Look at the first few rows

df.head() # shows header + first 5 rows!


Out[4]:
Name Age Team POS # 2013 $ Ht (In.) WT EXP 1st Year DOB School City State (Province, Territory, Etc..) Country Race HS Only
0 Gee, Alonzo 26 Cavaliers F 33 $3,250,000 78 219 4 2009 5/29/1987 Alabama Riviera Beach, FL Florida US Black No
1 Wallace, Gerald 31 Celtics F 45 $10,105,855 79 220 12 2001 7/23/1982 Alabama Sylacauga, AL Alabama US Black No
2 Williams, Mo 30 Trail Blazers G 25 $2,652,000 73 195 10 2003 12/19/1982 Alabama Jackson, MS Mississippi US Black No
3 Gladness, Mickell 27 Magic C 40 $762,195 83 220 2 2011 7/26/1986 Alabama A&M Birmingham, AL Alabama US Black No
4 Jefferson, Richard 33 Jazz F 44 $11,046,000 79 230 12 2001 6/21/1980 Arizona Los Angeles, CA California US Black No

...but maybe we want to see more than a measly five results?


In [5]:
# Let's look at MORE of the first few rows
df.head(10) # shows the first 10 lines of the program


Out[5]:
Name Age Team POS # 2013 $ Ht (In.) WT EXP 1st Year DOB School City State (Province, Territory, Etc..) Country Race HS Only
0 Gee, Alonzo 26 Cavaliers F 33 $3,250,000 78 219 4 2009 5/29/1987 Alabama Riviera Beach, FL Florida US Black No
1 Wallace, Gerald 31 Celtics F 45 $10,105,855 79 220 12 2001 7/23/1982 Alabama Sylacauga, AL Alabama US Black No
2 Williams, Mo 30 Trail Blazers G 25 $2,652,000 73 195 10 2003 12/19/1982 Alabama Jackson, MS Mississippi US Black No
3 Gladness, Mickell 27 Magic C 40 $762,195 83 220 2 2011 7/26/1986 Alabama A&M Birmingham, AL Alabama US Black No
4 Jefferson, Richard 33 Jazz F 44 $11,046,000 79 230 12 2001 6/21/1980 Arizona Los Angeles, CA California US Black No
5 Hill, Solomon 22 Pacers F 9 $1,246,680 79 220 0 2013 3/18/1991 Arizona Los Angeles, CA California US Black No
6 Budinger, Chase 25 Timberwolves F 10 $5,000,000 79 218 4 2009 5/22/1988 Arizona Encinitas, CA California US White No
7 Williams, Derrick 22 Timberwolves F 7 $5,016,960 80 241 2 2011 5/25/1991 Arizona La Mirada, CA California US Black No
8 Hill, Jordan 26 Lakers F/C 27 $3,563,600 82 235 1 2012 7/27/1987 Arizona Newberry, SC South Carolina US Black No
9 Frye, Channing 30 Suns F/C 8 $6,500,000 83 245 8 2005 5/17/1983 Arizona White Plains, NY New York US Black No

But maybe we want to make a basketball joke and see the final four?


In [6]:
# Let's look at the final few rows

df.tail(4) # shows the final four


Out[6]:
Name Age Team POS # 2013 $ Ht (In.) WT EXP 1st Year DOB School City State (Province, Territory, Etc..) Country Race HS Only
524 Landry, Marcus 27 Lakers F 14 $788,872 79 225 17 1996 11/1/1985 Wisconsin Milwaukee, WI Wisconsin US Black No
525 Harris, Devin 30 Mavericks G 20 $854,389 75 192 9 2004 2/27/1983 Wisconsin Milwaukee, WI Wisconsin US Black No
526 West, David 33 Pacers F 21 $12,000,000 81 250 10 2003 8/29/1980 Xavier Teaneck, NJ New Jersey US Black No
527 Crawford, Jordan 24 Celtics G 27 $2,162,419 76 195 3 2010 10/23/1988 Xavier Detroit, MI Michigan US Black No

So yes, head and tail work kind of like the terminal commands. That's nice, I guess.

But maybe we're incredibly demanding (which we are) and we want, say, the 6th through the 8th row (which we do). Don't worry (which I know you were), we can do that, too.


In [7]:
# Show the 6th through the 8th rows

df[6:9]


Out[7]:
Name Age Team POS # 2013 $ Ht (In.) WT EXP 1st Year DOB School City State (Province, Territory, Etc..) Country Race HS Only
6 Budinger, Chase 25 Timberwolves F 10 $5,000,000 79 218 4 2009 5/22/1988 Arizona Encinitas, CA California US White No
7 Williams, Derrick 22 Timberwolves F 7 $5,016,960 80 241 2 2011 5/25/1991 Arizona La Mirada, CA California US Black No
8 Hill, Jordan 26 Lakers F/C 27 $3,563,600 82 235 1 2012 7/27/1987 Arizona Newberry, SC South Carolina US Black No

It's kind of like an array, right? Except where in an array we'd say df[0] this time we need to give it two numbers, the start and the end.

Selecting columns

But jeez, my eyes don't want to go that far over the data. I only want to see, uh, name and age.


In [8]:
# Get the names of the columns, just because

df.columns #prints out the name of the columns - casing must match actually


Out[8]:
Index(['Name', 'Age', 'Team', 'POS', '#', '2013 $', 'Ht (In.)', 'WT', 'EXP',
       '1st Year', 'DOB', 'School', 'City',
       'State (Province, Territory, Etc..)', 'Country', 'Race', 'HS Only'],
      dtype='object')

In [9]:
# If we want to be "correct" we add .values on the end of it
df.columns.values


Out[9]:
array(['Name', 'Age', 'Team', 'POS', '#', '2013 $', 'Ht (In.)', 'WT',
       'EXP', '1st Year', 'DOB', 'School', 'City',
       'State (Province, Territory, Etc..)', 'Country', 'Race', 'HS Only'], dtype=object)

In [10]:
# Select only name and age

columns_we_want = ['Name', 'Age']

#passing the list of colums we want to data frame
df[columns_we_want]


Out[10]:
Name Age
0 Gee, Alonzo 26
1 Wallace, Gerald 31
2 Williams, Mo 30
3 Gladness, Mickell 27
4 Jefferson, Richard 33
5 Hill, Solomon 22
6 Budinger, Chase 25
7 Williams, Derrick 22
8 Hill, Jordan 26
9 Frye, Channing 30
10 Bayless, Jerryd 25
11 Terry, Jason 36
12 Fogg, Kyle 23
13 Iguodala, Andre 29
14 Boateng, Eric 27
15 Diogu, Ike 29
16 Ayres, Jeff 26
17 Harden, James 24
18 Felix, Carrick 23
19 Pargo, Jannero 33
20 Beverley, Patrick 25
21 Johnson, Joe 32
22 Brewer, Ronnie 28
23 Fisher, Derek 39
24 Miller, Quincy 20
25 Acy, Quincy 23
26 Jones, Perry 22
27 Udoh, Ekpe 26
28 Clark, Ian 22
29 Andersen, Chris 35
... ... ...
498 Paul, Chris 28
499 Teague, Jeff 25
500 Smith, Ish 25
501 Duncan, Tim 37
502 Hawes, Spencer 25
503 Wroten, Tony 20
504 Gaddy, Abdul 21
505 Thomas, Isaiah 24
506 Robinson, Nate 29
507 Ross, Terrence 22
508 Pondexter, Quincy 25
509 Holiday, Justin 24
510 Baynes, Aron 26
511 Thompson, Klay 23
512 Lillard, Damian 23
513 Alexander, Joe 26
514 Fischer, D'or 32
515 Ebanks, Devin 23
516 Johnson, Amir 26
517 Martin, Kevin 30
518 Evans, Jeremy 25
519 Lee, Courtney 28
520 Mekel, Gal 25
521 Murry, Toure' 23
522 Stiemsma, Greg 28
523 Leuer, Jon 24
524 Landry, Marcus 27
525 Harris, Devin 30
526 West, David 33
527 Crawford, Jordan 24

528 rows × 2 columns


In [11]:
# Combing that with .head() to see not-so-many rows

In [12]:
# We can also do this all in one line, even though it starts looking ugly
# (unlike the cute bears pandas looks ugly pretty often)

df[['Name', 'Age']] # brackets brackets


Out[12]:
Name Age
0 Gee, Alonzo 26
1 Wallace, Gerald 31
2 Williams, Mo 30
3 Gladness, Mickell 27
4 Jefferson, Richard 33
5 Hill, Solomon 22
6 Budinger, Chase 25
7 Williams, Derrick 22
8 Hill, Jordan 26
9 Frye, Channing 30
10 Bayless, Jerryd 25
11 Terry, Jason 36
12 Fogg, Kyle 23
13 Iguodala, Andre 29
14 Boateng, Eric 27
15 Diogu, Ike 29
16 Ayres, Jeff 26
17 Harden, James 24
18 Felix, Carrick 23
19 Pargo, Jannero 33
20 Beverley, Patrick 25
21 Johnson, Joe 32
22 Brewer, Ronnie 28
23 Fisher, Derek 39
24 Miller, Quincy 20
25 Acy, Quincy 23
26 Jones, Perry 22
27 Udoh, Ekpe 26
28 Clark, Ian 22
29 Andersen, Chris 35
... ... ...
498 Paul, Chris 28
499 Teague, Jeff 25
500 Smith, Ish 25
501 Duncan, Tim 37
502 Hawes, Spencer 25
503 Wroten, Tony 20
504 Gaddy, Abdul 21
505 Thomas, Isaiah 24
506 Robinson, Nate 29
507 Ross, Terrence 22
508 Pondexter, Quincy 25
509 Holiday, Justin 24
510 Baynes, Aron 26
511 Thompson, Klay 23
512 Lillard, Damian 23
513 Alexander, Joe 26
514 Fischer, D'or 32
515 Ebanks, Devin 23
516 Johnson, Amir 26
517 Martin, Kevin 30
518 Evans, Jeremy 25
519 Lee, Courtney 28
520 Mekel, Gal 25
521 Murry, Toure' 23
522 Stiemsma, Greg 28
523 Leuer, Jon 24
524 Landry, Marcus 27
525 Harris, Devin 30
526 West, David 33
527 Crawford, Jordan 24

528 rows × 2 columns

NOTE: That was not df['Name', 'Age'], it was df[['Name', 'Age']]. You'll definitely type it wrong all of the time. When things break with pandas it's probably because you forgot to put in a million brackets.

Describing your data

A powerful tool of pandas is being able to select a portion of your data, because who ordered all that data anyway.


In [13]:
df['POS'] # shows you each position


Out[13]:
0        F
1        F
2        G
3        C
4        F
5        F
6        F
7        F
8      F/C
9      F/C
10       G
11       G
12       G
13     G/F
14       C
15     F/C
16     F/C
17       G
18     G/F
19       G
20       G
21     G/F
22     G/F
23       G
24       F
25       F
26       F
27     F/C
28       G
29     F/C
      ... 
498      G
499      G
500      G
501    F/C
502      C
503      G
504      G
505      G
506      G
507      G
508    G/F
509    G/F
510    F/C
511    G/F
512      G
513      F
514      C
515      F
516    F/C
517      G
518      F
519    G/F
520      G
521    G/F
522      C
523      F
524      F
525      G
526      F
527      G
Name: POS, dtype: object

I want to know how many people are in each position. Luckily, pandas can tell me!


In [14]:
# Grab the POS column, and count the different values in it.
df['POS'].value_counts() # counts the number of values that match each position


Out[14]:
G      175
F      142
F/C     74
G/F     70
C       67
Name: POS, dtype: int64

In [15]:
df['Race'].value_counts() # race of players


Out[15]:
Black       399
White        95
Mixed        16
Hispanic     16
Asian         1
Name: Race, dtype: int64

Now that was a little weird, yes - we used df['POS'] instead of df[['POS']] when viewing the data's details.

But now I'm curious about numbers: how old is everyone? Maybe we could, I don't know, get some statistics about age? Some statistics to describe age?


In [16]:
# Summary statistics for Age
df['Age'].value_counts() # statistics about age


Out[16]:
23    56
25    54
24    53
22    51
27    49
26    41
28    37
21    27
33    24
30    24
29    23
31    21
20    20
32    14
35     9
36     6
34     5
37     5
19     3
39     3
38     2
18     1
Name: Age, dtype: int64

In [17]:
df['Age'].describe() #statistics about NBA players and their ages


Out[17]:
count    528.000000
mean      26.242424
std        4.178868
min       18.000000
25%       23.000000
50%       25.000000
75%       29.000000
max       39.000000
Name: Age, dtype: float64

In [18]:
# That's pretty good. Does it work for everything? How about the money?
df.describe() # shows info for all of the numerical data

# EEEK minum weight = 20 lbs -- seems incorrect


Out[18]:
Age Ht (In.) WT EXP 1st Year
count 528.000000 528.000000 528.000000 528.000000 528.000000
mean 26.242424 79.119318 221.206439 4.772727 2008.227273
std 4.178868 3.431488 27.943169 4.325628 4.325628
min 18.000000 69.000000 20.000000 0.000000 1995.000000
25% 23.000000 77.000000 200.000000 1.000000 2005.000000
50% 25.000000 80.000000 220.000000 4.000000 2009.000000
75% 29.000000 82.000000 240.000000 8.000000 2012.000000
max 39.000000 87.000000 290.000000 18.000000 2013.000000

In [19]:
df['Ht (In.)'].describe()


Out[19]:
count    528.000000
mean      79.119318
std        3.431488
min       69.000000
25%       77.000000
50%       80.000000
75%       82.000000
max       87.000000
Name: Ht (In.), dtype: float64

In [20]:
#df.columns # look at column names again

df['2013 $'].describe() # this column is string as opposed to int -- therefore it didn't work :()


Out[20]:
count     528
unique    308
top       n/a
freq       43
Name: 2013 $, dtype: object

Unfortunately because that has dollar signs and commas it's thought of as a string. We'll fix it in a second, but let's try describing one more thing.


In [21]:
# Doing more describing

That's stupid, though, what's an inch even look like? What's 80 inches? I don't have a clue. If only there were some wa to manipulate our data.

Manipulating data

Oh wait there is, HA HA HA.


In [22]:
# Take another look at our inches, but only the first few

df['Ht (In.)'].head()


Out[22]:
0    78
1    79
2    73
3    83
4    79
Name: Ht (In.), dtype: int64

In [23]:
# Divide those inches by 12
df['Ht (In.)'].head()/12 #divides every single value by 12


Out[23]:
0    6.500000
1    6.583333
2    6.083333
3    6.916667
4    6.583333
Name: Ht (In.), dtype: float64

In [24]:
# Let's divide ALL of them by 12
df['Ht (In.)']/12


Out[24]:
0      6.500000
1      6.583333
2      6.083333
3      6.916667
4      6.583333
5      6.583333
6      6.583333
7      6.666667
8      6.833333
9      6.916667
10     6.250000
11     6.166667
12     6.250000
13     6.500000
14     6.833333
15     6.666667
16     6.750000
17     6.416667
18     6.500000
19     6.083333
20     6.083333
21     6.583333
22     6.583333
23     6.083333
24     6.750000
25     6.583333
26     6.916667
27     6.833333
28     6.250000
29     6.833333
         ...   
498    6.000000
499    6.166667
500    6.000000
501    6.916667
502    7.083333
503    6.500000
504    6.250000
505    5.750000
506    5.750000
507    6.500000
508    6.500000
509    6.500000
510    6.833333
511    6.583333
512    6.250000
513    6.666667
514    6.916667
515    6.750000
516    6.750000
517    6.583333
518    6.750000
519    6.416667
520    6.250000
521    6.416667
522    6.916667
523    6.833333
524    6.583333
525    6.250000
526    6.750000
527    6.333333
Name: Ht (In.), dtype: float64

In [25]:
# Can we get statistics on those?
height_in_feet = df['Ht (In.)']/12
height_in_feet.describe()


Out[25]:
count    528.000000
mean       6.593277
std        0.285957
min        5.750000
25%        6.416667
50%        6.666667
75%        6.833333
max        7.250000
Name: Ht (In.), dtype: float64

In [26]:
# Let's look at our original data again
df.head()


Out[26]:
Name Age Team POS # 2013 $ Ht (In.) WT EXP 1st Year DOB School City State (Province, Territory, Etc..) Country Race HS Only
0 Gee, Alonzo 26 Cavaliers F 33 $3,250,000 78 219 4 2009 5/29/1987 Alabama Riviera Beach, FL Florida US Black No
1 Wallace, Gerald 31 Celtics F 45 $10,105,855 79 220 12 2001 7/23/1982 Alabama Sylacauga, AL Alabama US Black No
2 Williams, Mo 30 Trail Blazers G 25 $2,652,000 73 195 10 2003 12/19/1982 Alabama Jackson, MS Mississippi US Black No
3 Gladness, Mickell 27 Magic C 40 $762,195 83 220 2 2011 7/26/1986 Alabama A&M Birmingham, AL Alabama US Black No
4 Jefferson, Richard 33 Jazz F 44 $11,046,000 79 230 12 2001 6/21/1980 Arizona Los Angeles, CA California US Black No

Okay that was nice but unfortunately we can't do anything with it. It's just sitting there, separate from our data. If this were normal code we could do blahblah['feet'] = blahblah['Ht (In.)'] / 12, but since this is pandas, we can't. Right? Right?


In [27]:
# Store a new column
df['Ht (Ft.)'] = df['Ht (In.)']/12 # adds a new column with the height as feet

df.head()


Out[27]:
Name Age Team POS # 2013 $ Ht (In.) WT EXP 1st Year DOB School City State (Province, Territory, Etc..) Country Race HS Only Ht (Ft.)
0 Gee, Alonzo 26 Cavaliers F 33 $3,250,000 78 219 4 2009 5/29/1987 Alabama Riviera Beach, FL Florida US Black No 6.500000
1 Wallace, Gerald 31 Celtics F 45 $10,105,855 79 220 12 2001 7/23/1982 Alabama Sylacauga, AL Alabama US Black No 6.583333
2 Williams, Mo 30 Trail Blazers G 25 $2,652,000 73 195 10 2003 12/19/1982 Alabama Jackson, MS Mississippi US Black No 6.083333
3 Gladness, Mickell 27 Magic C 40 $762,195 83 220 2 2011 7/26/1986 Alabama A&M Birmingham, AL Alabama US Black No 6.916667
4 Jefferson, Richard 33 Jazz F 44 $11,046,000 79 230 12 2001 6/21/1980 Arizona Los Angeles, CA California US Black No 6.583333

In [28]:
df.sort_values('Ht (Ft.)') # automatically sorts from lowest to highest - ascending value


Out[28]:
Name Age Team POS # 2013 $ Ht (In.) WT EXP 1st Year DOB School City State (Province, Territory, Etc..) Country Race HS Only Ht (Ft.)
506 Robinson, Nate 29 Nuggets G 10 $2,016,000 69 180 8 2005 5/31/1984 Washington Seattle, WA Washington US Black No 5.750000
505 Thomas, Isaiah 24 Kings G 22 $884,293 69 185 2 2011 2/7/1989 Washington Tacoma, WA Washington US Black No 5.750000
235 Larkin, Shane 21 Mavericks G 3 $1,536,960 71 176 0 2013 10/2/1992 Miami (FL) Cincinnati, OH Ohio US Black No 5.916667
362 Lucas III, John 30 Jazz G 5 $1,600,000 71 165 8 2005 11/21/1982 Oklahoma State Washington, DC DC US Black No 5.916667
256 Pressey, Phil 22 Celtics G 26 $490,180 71 175 0 2013 2/17/1991 Missouri Dallas, TX Texas US Black No 5.916667
336 Lawson, Ty 25 Nuggets G 3 $10,786,517 71 195 4 2009 11/3/1987 North Carolina Clinton, MD Maryland US Black No 5.916667
388 McConnell, Mickey 24 Mavericks G 32 $490,180 72 189 2 2011 4/14/1989 St. Mary's (CA) Mesa, AZ Arizona US White No 6.000000
498 Paul, Chris 28 Clippers G 3 $18,668,431 72 175 8 2005 5/6/1985 Wake Forest Forsyth County, NC North Carolina US Black No 6.000000
133 Bynum, Will 30 Pistons G 12 $2,790,343 72 185 8 2005 1/4/1983 Georgia Tech Chicago, IL Illinois US Black No 6.000000
500 Smith, Ish 25 Suns G 30 $951,463 72 175 3 2010 7/5/1988 Wake Forest Charlotte, NC North Carolina US Black No 6.000000
387 Mills, Patrick 25 Spurs G 8 $1,133,950 72 185 4 2009 8/11/1988 St. Mary's (CA) Canberra New South Wales Australia Black No 6.000000
489 Lowry, Kyle 27 Raptors G 7 $6,210,000 72 205 7 2006 3/25/1986 Villanova Philadelphia, PA Pennsylvania US Black No 6.000000
258 Canaan, Isaiah 22 Rockets G 1 $570,515 72 188 0 2013 5/2/1991 Murray State Biloxi, MS Mississippi US Black No 6.000000
450 Augustin, D. J. 25 Raptors G 14 $1,267,000 72 183 5 2008 11/10/1987 Texas New Orleans, LA Louisiana US Black No 6.000000
368 Brooks, Aaron 28 Rockets G 0 $884,293 72 161 6 2007 1/14/1985 Oregon Seattle, WA Washington US Black No 6.000000
202 Siva, Peyton 22 Pistons G 34 $490,180 72 185 0 2013 10/24/1990 Louisville Seattle, WA Washington US Hispanic No 6.000000
347 Barea, JosÈ Juan 29 Timberwolves G 11 $4,687,000 72 175 7 2006 6/26/1984 Northeastern Mayaguez n/a Puerto Rico Hispanic No 6.000000
464 Collison, Darren 26 Clippers G 2 $1,900,000 72 175 4 2009 8/23/1987 UCLA Rancho Cucamonga, CA California US Black No 6.000000
385 Nelson, Jameer 31 Magic G 14 $8,600,000 72 190 9 2004 2/9/1982 Saint Joseph's Chester, PA Pennsylvania US Black No 6.000000
240 Burke, Trey 20 Jazz G 3 $2,438,760 73 185 0 2013 11/12/1992 Michigan Columbus, OH Ohio US Black No 6.083333
62 Walker, Kemba 23 Bobcats G 15 $2,568,360 73 184 2 2011 5/8/1990 Connecticut New York City, NY New York US Black No 6.083333
152 Machado, Scott 23 Jazz G 30 $788,872 73 205 1 2012 6/8/1990 Iona New York City, NY New York US Black No 6.083333
470 Watson, Earl 34 Trail Blazers G 17 $884,293 73 199 12 2001 6/12/1979 UCLA Kansas City, KA Kansas US Black No 6.083333
71 Roberts, Brian 27 Pelicans G 22 $788,872 73 173 1 2012 12/3/1985 Dayton Toledo, OH Ohio US Black No 6.083333
487 Wayns, Maalik 22 Clippers G 5 $788,872 73 195 1 2012 5/2/1991 Villanova Philadelphia, PA Pennsylvania US Black No 6.083333
351 Jennings, Brandon 24 Pistons G 7 $7,655,503 73 169 4 2009 9/23/1989 Oak Hill Academy (VA) Lakewood, CA California US Black Yes 6.083333
356 Conley, Mike 26 Grizzlies G 11 $8,600,001 73 185 6 2007 10/11/1987 Ohio State Fayetteville, AR Arkansas US Black No 6.083333
23 Fisher, Derek 39 Thunder G 6 $884,293 73 210 17 1996 8/9/1974 Arkansas-Little Rock Little Rock, AR Arkansas US Black No 6.083333
284 Schrˆder, Dennis 20 Hawks G 17 $1,348,200 73 165 0 2013 9/15/1993 n/a Braunschweig, LS Lower Saxony Germany Black No 6.083333
2 Williams, Mo 30 Trail Blazers G 25 $2,652,000 73 195 10 2003 12/19/1982 Alabama Jackson, MS Mississippi US Black No 6.083333
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
405 Dedmon, Dewayne 24 Warriors C -- n/a 84 255 0 2013 8/12/1989 Southern California Lancaster, CA California US Black No 7.000000
52 Smith, Jason 27 Pelicans F/C 14 $2,500,000 84 240 6 2007 3/2/1986 Colorado State Greeley, CO Colorado US White No 7.000000
461 Hollins, Ryan 29 Clippers C 15 $884,293 84 240 7 2006 10/10/1984 UCLA Pasadena, CA California US Black No 7.000000
140 Sacre, Robert 24 Lakers C 50 $788,872 84 260 6 2007 6/6/1989 Gonzaga Baton Rouge, LA Louisiana US Mixed No 7.000000
282 Nowitzki, Dirk 35 Mavericks F 41 $22,721,381 84 245 15 1998 6/19/1978 n/a Wurzburg, BA Bavaria Germany White No 7.000000
41 Kaman, Chris 31 Lakers C 9 $3,183,000 84 265 4 2009 4/28/1982 Central Michigan Grand Rapids, MI Michigan US White No 7.000000
426 Melo, Fab 23 Mavericks C 51 $788,872 84 255 1 2012 1/20/1990 Syracuse Juiz de For a, MG Minas Gerais Brazil Black No 7.000000
292 Motiej?nas, Donatas 23 Rockets F/C 20 $1,422,720 84 222 1 2012 9/20/1990 n/a Kaunas n/a Lithuania White No 7.000000
149 Zeller, Cody 21 Bobcats F/C 40 $3,857,040 84 240 0 2013 10/5/1992 Indiana Washington, IN Indiana US White No 7.000000
421 Lopez, Robin 25 Trail Blazers C 42 $5,904,261 84 255 5 2008 4/8/1988 Stanford Los Angeles, CA California US Mixed No 7.000000
404 Vu?evi?, Nikola 22 Magic C 9 $1,793,520 84 240 2 2011 10/24/1990 Southern Califoria Morges, VA Vaud Switzerland White No 7.000000
420 Lopez, Brook 25 Nets C 11 $14,693,906 84 265 5 2008 4/1/1988 Stanford Los Angeles, CA California US Mixed No 7.000000
479 Bogut, Andrew 28 Warriors C 12 $14,000,000 84 260 8 2005 11/28/1984 Utah Melbourne, VI Victoria Australia White No 7.000000
159 Withey, Jeff 23 Pelicans C 5 $490,180 84 235 0 2013 3/7/1990 Kansas San Diego, CA California US White No 7.000000
305 Gasol, Pau 33 Lakers F/C 16 $19,285,850 84 250 3 2010 7/6/1980 n/a Barcelona, Spain n/a Spain Hispanic No 7.000000
296 Biedri?ö, Andris 27 Jazz C 11 $9,000,000 84 242 9 2004 4/2/1986 n/a Riga n/a Russia White No 7.000000
32 O'Bryant, Patrick 27 Bobcats C 18 n/a 84 250 7 2006 6/20/1986 Bradley Oskaloosa, IA Iowa US Black No 7.000000
137 Olynyk, Kelly 22 Celtics F/C 41 $1,986,360 84 238 0 2013 4/19/1991 Gonzaga Toronto, ON Ontario Canada White No 7.000000
295 Bargnani, Andrea 28 Knicks F/C 77 $11,862,500 84 250 11 2002 8/26/1985 n/a Rome, LZ Lazio Rome White No 7.000000
416 Bynum, Andrew 25 Cavaliers C 21 $12,250,000 84 285 8 2005 10/27/1987 St. Joseph HS (NJ) Plainsboro, NJ New Jersey US Black Yes 7.000000
297 Mozgov, Timofey 27 Nuggets C 25 $4,400,000 85 250 3 2010 7/16/1986 n/a St. Petersburg n/a Russia White No 7.083333
145 Leonard, Meyers 21 Trail Blazers C 11 $2,222,160 85 245 1 2012 2/27/1992 Illinois Robinson, IIL Illinois US White No 7.083333
76 Chandler, Tyson 31 Knicks C 6 $14,100,538 85 240 12 2001 10/2/1982 Dominguez HS (CA) Hanford, CA California US Black Yes 7.083333
303 Gasol, Marc 28 Grizzlies C 33 $14,860,524 85 265 5 2008 1/29/1985 n/a Barcelona n/a Spain Hispanic No 7.083333
274 Gobert, Rudy 21 Jazz C 27 $1,078,800 85 235 0 2013 6/26/1992 n/a Saint-Quentin Aisne France Mixed No 7.083333
221 Len, Alex 20 Suns C 21 $3,492,720 85 255 0 2013 6/16/1993 Maryland Antratsy n/a Ukraine White No 7.083333
316 Kuzmi?, Ognjen 23 Warriors C 1 $490,180 85 231 0 2013 5/16/1990 n/a Doboj n/a Yugoslavia White No 7.083333
502 Hawes, Spencer 25 76ers C 0 $6,500,000 85 245 6 2007 4/28/1988 Washington Seattle, WA Washington US White No 7.083333
120 Hibbert, Roy 26 Pacers C 55 $14,283,844 86 280 5 2008 12/11/1986 Georgetown New York City, NY New York US Black No 7.166667
54 Thabeet, Hasheem 26 Thunder C 34 $1,200,000 87 263 4 2009 2/16/1987 Connecticut Dar es Salaam n/a Tanzania Black No 7.250000

528 rows × 18 columns


In [29]:
#shows the tallest players by height in feet
df.sort_values('Ht (Ft.)', ascending =False).head() # automatically sorts from lowest to highest - ascending value


Out[29]:
Name Age Team POS # 2013 $ Ht (In.) WT EXP 1st Year DOB School City State (Province, Territory, Etc..) Country Race HS Only Ht (Ft.)
54 Thabeet, Hasheem 26 Thunder C 34 $1,200,000 87 263 4 2009 2/16/1987 Connecticut Dar es Salaam n/a Tanzania Black No 7.250000
120 Hibbert, Roy 26 Pacers C 55 $14,283,844 86 280 5 2008 12/11/1986 Georgetown New York City, NY New York US Black No 7.166667
502 Hawes, Spencer 25 76ers C 0 $6,500,000 85 245 6 2007 4/28/1988 Washington Seattle, WA Washington US White No 7.083333
145 Leonard, Meyers 21 Trail Blazers C 11 $2,222,160 85 245 1 2012 2/27/1992 Illinois Robinson, IIL Illinois US White No 7.083333
303 Gasol, Marc 28 Grizzlies C 33 $14,860,524 85 265 5 2008 1/29/1985 n/a Barcelona n/a Spain Hispanic No 7.083333

In [30]:
#shows you who is/isn't above 6"5 ft. 
above_or_below_six_five = df['Ht (Ft.)'] > 6
above_or_below_six_five.value_counts() # returns how many players are or are not above 6"5


Out[30]:
True     509
False     19
Name: Ht (Ft.), dtype: int64

That's cool, maybe we could do the same thing with their salary? Take out the $ and the , and convert it to an integer?


In [31]:
# Can't just use .replace

In [32]:
# Need to use this weird .str thing

In [33]:
# Can't just immediately replace the , either

In [34]:
# Need to use the .str thing before EVERY string method

In [35]:
# Describe still doesn't work.

In [36]:
# Let's convert it to an integer using .astype(int) before we describe it

In [ ]:


In [37]:
# Maybe we can just make them millions?

In [38]:
# Unfortunately one is "n/a" which is going to break our code, so we can make n/a be 0

In [39]:
# Remove the .head() piece and save it back into the dataframe

In [ ]:

The average basketball player makes 3.8 million dollars and is a little over six and a half feet tall.

But who cares about those guys? I don't care about those guys. They're boring. I want the real rich guys!

Sorting and sub-selecting


In [40]:
# This is just the first few guys in the dataset. Can we order it?

In [41]:
# Let's try to sort them

Those guys are making nothing! If only there were a way to sort from high to low, a.k.a. descending instead of ascending.


In [42]:
# It isn't descending = True, unfortunately

In [43]:
# We can use this to find the oldest guys in the league
#shows the oldest players 
df.sort_values('Age', ascending =False).head() # automatically sorts from lowest to highest - ascending value


Out[43]:
Name Age Team POS # 2013 $ Ht (In.) WT EXP 1st Year DOB School City State (Province, Territory, Etc..) Country Race HS Only Ht (Ft.)
392 Nash, Steve 39 Lakers G 10 $9,300,500 75 178 7 2006 2/7/1974 Santa Clara Johannesburg, SA n/a South Africa White No 6.250000
225 Camby, Marcus 39 Rockets F/C 21 $884,293 83 240 17 1996 3/22/1974 Massachusetts Hartford, CT Connecticut US Black No 6.916667
23 Fisher, Derek 39 Thunder G 6 $884,293 73 210 17 1996 8/9/1974 Arkansas-Little Rock Little Rock, AR Arkansas US Black No 6.083333
63 Allen, Ray 38 Heat G 34 $3,229,050 77 205 17 1996 7/20/1975 Connecticut Merced, CA California US Black No 6.416667
94 James, Mike 38 Bulls G 8 n/a 74 188 15 1998 6/23/1975 Duquesne Copiague, NY New York US Black No 6.166667

In [44]:
# Or the youngest, by taking out 'ascending=False'

#shows the youngest players 
df.sort_values('Age').head() # automatically sorts from lowest to highest


Out[44]:
Name Age Team POS # 2013 $ Ht (In.) WT EXP 1st Year DOB School City State (Province, Territory, Etc..) Country Race HS Only Ht (Ft.)
285 Antetokounmpo, Giannis 18 Bucks G/F 34 $1,792,560 81 205 1 2012 12/16/1994 n/a Athens n/a Greece Black No 6.750000
174 Noel, Nerlens 19 76ers C 4 $3,171,320 83 228 0 2013 4/10/1994 Kentucky Malden, MA Massachussetts US Black No 6.916667
191 Goodwin, Archie 19 Suns G 20 $1,064,400 77 198 0 2013 8/17/1994 Kentucky Little Rock, AR Arkansas US Black No 6.416667
300 Karasev, Sergey 19 Cavaliers G/F 10 $1,467,840 79 203 0 2013 10/26/1993 n/a Saint Petersburg n/a Russia NaN No 6.583333
503 Wroten, Tony 20 76ers G 8 $1,160,040 78 205 1 2012 4/13/1993 Washington Renton, WA Washington US Black No 6.500000

But sometimes instead of just looking at them, I want to do stuff with them. Play some games with them! Dunk on them~ describe them! And we don't want to dunk on everyone, only the players above 7 feet tall.

First, we need to check out boolean things.


In [45]:
# Get a big long list of True and False for every single row.
#shows you who is/isn't above 6"5 ft. 
above_or_below_six_five = df['Ht (Ft.)'] > 6
# print(above_or_below_six_five)

In [46]:
# We could use value counts if we wanted
above_or_below_six_five.value_counts() # returns how many players are or are not above 6"5


Out[46]:
True     509
False     19
Name: Ht (Ft.), dtype: int64

In [47]:
# But we can also apply this to every single row to say whether YES we want it or NO we don't
above_or_below_six_five = df['Ht (Ft.)'] > 7

In [48]:
df[df['Race'] == 'Asian']


Out[48]:
Name Age Team POS # 2013 $ Ht (In.) WT EXP 1st Year DOB School City State (Province, Territory, Etc..) Country Race HS Only Ht (Ft.)
143 Lin, Jeremy 25 Rockets G 7 $8,374,646 75 200 3 2010 8/23/1988 Harvard Los Angeles, CA California US Asian No 6.25

In [49]:
# Instead of putting column names inside of the brackets, we instead
# put the True/False statements. It will only return the players above 
# seven feet tall

In [50]:
# Or only the guards
df[df['Ht (Ft.)'] > 7]


Out[50]:
Name Age Team POS # 2013 $ Ht (In.) WT EXP 1st Year DOB School City State (Province, Territory, Etc..) Country Race HS Only Ht (Ft.)
54 Thabeet, Hasheem 26 Thunder C 34 $1,200,000 87 263 4 2009 2/16/1987 Connecticut Dar es Salaam n/a Tanzania Black No 7.250000
76 Chandler, Tyson 31 Knicks C 6 $14,100,538 85 240 12 2001 10/2/1982 Dominguez HS (CA) Hanford, CA California US Black Yes 7.083333
120 Hibbert, Roy 26 Pacers C 55 $14,283,844 86 280 5 2008 12/11/1986 Georgetown New York City, NY New York US Black No 7.166667
145 Leonard, Meyers 21 Trail Blazers C 11 $2,222,160 85 245 1 2012 2/27/1992 Illinois Robinson, IIL Illinois US White No 7.083333
221 Len, Alex 20 Suns C 21 $3,492,720 85 255 0 2013 6/16/1993 Maryland Antratsy n/a Ukraine White No 7.083333
274 Gobert, Rudy 21 Jazz C 27 $1,078,800 85 235 0 2013 6/26/1992 n/a Saint-Quentin Aisne France Mixed No 7.083333
297 Mozgov, Timofey 27 Nuggets C 25 $4,400,000 85 250 3 2010 7/16/1986 n/a St. Petersburg n/a Russia White No 7.083333
303 Gasol, Marc 28 Grizzlies C 33 $14,860,524 85 265 5 2008 1/29/1985 n/a Barcelona n/a Spain Hispanic No 7.083333
316 Kuzmi?, Ognjen 23 Warriors C 1 $490,180 85 231 0 2013 5/16/1990 n/a Doboj n/a Yugoslavia White No 7.083333
502 Hawes, Spencer 25 76ers C 0 $6,500,000 85 245 6 2007 4/28/1988 Washington Seattle, WA Washington US White No 7.083333

In [51]:
# Or only the guards who are under 6 feet tall

# are you a guard? AND are below 6 feet tall?
df[(df['POS'] == 'G') & (df['Ht (Ft.)'] < 6)]


Out[51]:
Name Age Team POS # 2013 $ Ht (In.) WT EXP 1st Year DOB School City State (Province, Territory, Etc..) Country Race HS Only Ht (Ft.)
235 Larkin, Shane 21 Mavericks G 3 $1,536,960 71 176 0 2013 10/2/1992 Miami (FL) Cincinnati, OH Ohio US Black No 5.916667
256 Pressey, Phil 22 Celtics G 26 $490,180 71 175 0 2013 2/17/1991 Missouri Dallas, TX Texas US Black No 5.916667
336 Lawson, Ty 25 Nuggets G 3 $10,786,517 71 195 4 2009 11/3/1987 North Carolina Clinton, MD Maryland US Black No 5.916667
362 Lucas III, John 30 Jazz G 5 $1,600,000 71 165 8 2005 11/21/1982 Oklahoma State Washington, DC DC US Black No 5.916667
505 Thomas, Isaiah 24 Kings G 22 $884,293 69 185 2 2011 2/7/1989 Washington Tacoma, WA Washington US Black No 5.750000
506 Robinson, Nate 29 Nuggets G 10 $2,016,000 69 180 8 2005 5/31/1984 Washington Seattle, WA Washington US Black No 5.750000

In [52]:
# It might be easier to break down the booleans into separate variables
is_a_guard =  df['POS'] == 'G'
is_below_six_feet = df['Ht (Ft.)'] < 6
df[is_a_guard & is_below_six_feet]


Out[52]:
Name Age Team POS # 2013 $ Ht (In.) WT EXP 1st Year DOB School City State (Province, Territory, Etc..) Country Race HS Only Ht (Ft.)
235 Larkin, Shane 21 Mavericks G 3 $1,536,960 71 176 0 2013 10/2/1992 Miami (FL) Cincinnati, OH Ohio US Black No 5.916667
256 Pressey, Phil 22 Celtics G 26 $490,180 71 175 0 2013 2/17/1991 Missouri Dallas, TX Texas US Black No 5.916667
336 Lawson, Ty 25 Nuggets G 3 $10,786,517 71 195 4 2009 11/3/1987 North Carolina Clinton, MD Maryland US Black No 5.916667
362 Lucas III, John 30 Jazz G 5 $1,600,000 71 165 8 2005 11/21/1982 Oklahoma State Washington, DC DC US Black No 5.916667
505 Thomas, Isaiah 24 Kings G 22 $884,293 69 185 2 2011 2/7/1989 Washington Tacoma, WA Washington US Black No 5.750000
506 Robinson, Nate 29 Nuggets G 10 $2,016,000 69 180 8 2005 5/31/1984 Washington Seattle, WA Washington US Black No 5.750000

In [53]:
centers = df[df['POS'] == 'C']
guards = df[df['POS'] == 'G']

In [54]:
# We can save this stuff

centers['Ht (Ft.)'].describe()


Out[54]:
count    67.000000
mean      6.962687
std       0.087381
min       6.750000
25%       6.916667
50%       7.000000
75%       7.000000
max       7.250000
Name: Ht (Ft.), dtype: float64

In [55]:
guards['Ht (Ft.)'].describe()


Out[55]:
count    175.000000
mean       6.263810
std        0.165729
min        5.750000
25%        6.166667
50%        6.250000
75%        6.416667
max        6.583333
Name: Ht (Ft.), dtype: float64

In [56]:
# Maybe we can compare them to taller players?

Drawing pictures

Okay okay enough code and enough stupid numbers. I'm visual. I want graphics. Okay????? Okay.


In [57]:
!pip install matplotlib


Requirement already satisfied (use --upgrade to upgrade): matplotlib in /Users/Monica/.virtualenvs/dataanalysis/lib/python3.5/site-packages
Requirement already satisfied (use --upgrade to upgrade): cycler in /Users/Monica/.virtualenvs/dataanalysis/lib/python3.5/site-packages (from matplotlib)
Requirement already satisfied (use --upgrade to upgrade): numpy>=1.6 in /Users/Monica/.virtualenvs/dataanalysis/lib/python3.5/site-packages (from matplotlib)
Requirement already satisfied (use --upgrade to upgrade): pyparsing!=2.0.0,!=2.0.4,>=1.5.6 in /Users/Monica/.virtualenvs/dataanalysis/lib/python3.5/site-packages (from matplotlib)
Requirement already satisfied (use --upgrade to upgrade): python-dateutil in /Users/Monica/.virtualenvs/dataanalysis/lib/python3.5/site-packages (from matplotlib)
Requirement already satisfied (use --upgrade to upgrade): pytz in /Users/Monica/.virtualenvs/dataanalysis/lib/python3.5/site-packages (from matplotlib)
Requirement already satisfied (use --upgrade to upgrade): six in /Users/Monica/.virtualenvs/dataanalysis/lib/python3.5/site-packages (from cycler->matplotlib)

In [63]:
import matplotlib.pyplot as plt
%matplotlib inline
# This will scream we don't have matplotlib.
df['Ht (Ft.)'].hist()


Out[63]:
<matplotlib.axes._subplots.AxesSubplot at 0x10b439c50>

matplotlib is a graphing library. It's the Python way to make graphs!


In [64]:
%matplotlib inline
# save things as .png and not .jpeg

plt.savefig('heights.png')


<matplotlib.figure.Figure at 0x10b3e2080>

In [105]:
# this will open up a weird window that won't do anything

In [106]:
# So instead you run this code

In [ ]:

But that's ugly. There's a thing called ggplot for R that looks nice. We want to look nice. We want to look like ggplot.


In [107]:
# Import matplotlib
# What's available?

In [108]:
# Use ggplot

In [109]:
# Make a histogram

In [110]:
# Try some other styles

In [ ]:

That might look better with a little more customization. So let's customize it.


In [111]:
# Pass in all sorts of stuff!
# Most from http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.hist.html
# .range() is a matplotlib thing

I want more graphics! Do tall people make more money?!?!


In [ ]:


In [ ]:


In [112]:
# How does experience relate with the amount of money they're making?

In [113]:
# At least we can assume height and weight are related

In [114]:
# At least we can assume height and weight are related
# http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.plot.html

In [ ]:


In [ ]:


In [115]:
# We can also use plt separately
# It's SIMILAR but TOTALLY DIFFERENT

In [ ]: