An Introduction to pandas

Pandas! They are adorable animals. You might think they are the worst animal ever but that is not true. You might sometimes think pandas is the worst library every, and that is only kind of true.

The important thing is use the right tool for the job. pandas is good for some stuff, SQL is good for some stuff, writing raw Python is good for some stuff. You'll figure it out as you go along.

Now let's start coding. Hopefully you did pip install pandas before you started up this notebook.


In [1]:
# import pandas, but call it pd. Why? Because that's What People Do.

In [2]:
import pandas as pd

When you import pandas, you use import pandas as pd. That means instead of typing pandas in your code you'll type pd.

You don't have to, but every other person on the planet will be doing it, so you might as well.

Now we're going to read in a file. Our file is called NBA-Census-10.14.2013.csv because we're sports moguls. pandas can read_ different types of files, so try to figure it out by typing pd.read_ and hitting tab for autocomplete.


In [3]:
# We're going to call this df, which means "data frame"
# It isn't in UTF-8 (I saved it from my mac!) so we need to set the encoding
df = pd.read_csv("NBA-Census-10.14.2013.csv", encoding ="mac_roman")
#this is a data frame (df)

A dataframe is basically a spreadsheet, except it lives in the world of Python or the statistical programming language R. They can't call it a spreadsheet because then people would think those programmers used Excel, which would make them boring and normal and they'd have to wear a tie every day.

Selecting rows

Now let's look at our data, since that's what data is for


In [4]:
# Let's look at all of it
df


Out[4]:
Name Age Team POS # 2013 $ Ht (In.) WT EXP 1st Year DOB School City State (Province, Territory, Etc..) Country Race HS Only
0 Gee, Alonzo 26 Cavaliers F 33 $3,250,000 78 219 4 2009 5/29/1987 Alabama Riviera Beach, FL Florida US Black No
1 Wallace, Gerald 31 Celtics F 45 $10,105,855 79 220 12 2001 7/23/1982 Alabama Sylacauga, AL Alabama US Black No
2 Williams, Mo 30 Trail Blazers G 25 $2,652,000 73 195 10 2003 12/19/1982 Alabama Jackson, MS Mississippi US Black No
3 Gladness, Mickell 27 Magic C 40 $762,195 83 220 2 2011 7/26/1986 Alabama A&M Birmingham, AL Alabama US Black No
4 Jefferson, Richard 33 Jazz F 44 $11,046,000 79 230 12 2001 6/21/1980 Arizona Los Angeles, CA California US Black No
5 Hill, Solomon 22 Pacers F 9 $1,246,680 79 220 0 2013 3/18/1991 Arizona Los Angeles, CA California US Black No
6 Budinger, Chase 25 Timberwolves F 10 $5,000,000 79 218 4 2009 5/22/1988 Arizona Encinitas, CA California US White No
7 Williams, Derrick 22 Timberwolves F 7 $5,016,960 80 241 2 2011 5/25/1991 Arizona La Mirada, CA California US Black No
8 Hill, Jordan 26 Lakers F/C 27 $3,563,600 82 235 1 2012 7/27/1987 Arizona Newberry, SC South Carolina US Black No
9 Frye, Channing 30 Suns F/C 8 $6,500,000 83 245 8 2005 5/17/1983 Arizona White Plains, NY New York US Black No
10 Bayless, Jerryd 25 Grizzlies G 7 $3,135,000 75 200 5 2008 8/20/1988 Arizona Phoenix, AZ Arizona US Black No
11 Terry, Jason 36 Nets G 31 $5,625,313 74 180 14 1999 9/15/1977 Arizona Seattle, WA Washington US Black No
12 Fogg, Kyle 23 Nuggets G 6 n/a 75 183 0 2013 1/27/1990 Arizona Brea, CA California US Black No
13 Iguodala, Andre 29 Warriors G/F 9 $12,868,632 78 207 9 2004 1/28/1984 Arizona Springfield, IL Illinois US Black No
14 Boateng, Eric 27 Lakers C 12 n/a 82 257 17 1996 11/20/1985 Arizona State London, ENG n/a England Black No
15 Diogu, Ike 29 Knicks F/C 50 $792,377 80 255 8 2005 11/9/1983 Arizona State Buffalo, NY New York US Black No
16 Ayres, Jeff 26 Spurs F/C 11 $1,750,000 81 250 4 2009 4/29/1987 Arizona State Ontario, CA California US Black No
17 Harden, James 24 Rockets G 13 $13,701,250 77 220 4 2009 8/26/1989 Arizona State Los Angeles, CA California US Black No
18 Felix, Carrick 23 Cavaliers G/F 30 $510,000 78 210 0 2013 8/17/1990 Arizona State Goodyear, AZ Arizona US Black No
19 Pargo, Jannero 33 Bobcats G 5 $884,293 73 185 11 2002 10/22/1979 Arkansas Chicago, IL Illinois US Black No
20 Beverley, Patrick 25 Rockets G 2 $788,872 73 185 5 2008 7/12/1988 Arkansas Chicago, IL Illinois US Black No
21 Johnson, Joe 32 Nets G/F 7 $21,466,718 79 240 12 2001 6/29/1981 Arkansas Little Rock, AR Arkansas US Black No
22 Brewer, Ronnie 28 Rockets G/F 10 $1,186,459 79 235 7 2006 3/20/1985 Arkansas Portland, OR Oregon US Black No
23 Fisher, Derek 39 Thunder G 6 $884,293 73 210 17 1996 8/9/1974 Arkansas-Little Rock Little Rock, AR Arkansas US Black No
24 Miller, Quincy 20 Nuggets F 30 $788,872 81 210 1 2012 11/18/1992 Baylor North Carolina, IL Illinois US Black No
25 Acy, Quincy 23 Raptors F 4 $788,872 79 225 1 2012 10/6/1990 Baylor Tyler, TX Texas US Black No
26 Jones, Perry 22 Thunder F 3 $1,082,520 83 235 1 2012 9/24/1991 Baylor Winnsboro, LA Louisiana US Black No
27 Udoh, Ekpe 26 Bucks F/C 5 $4,469,548 82 245 3 2010 5/20/1987 Baylor Edmond, OK Oklahoma US Black No
28 Clark, Ian 22 Jazz G 21 $490,180 75 175 0 2013 3/7/1991 Belmont Memphis, TN Tennessee US Black No
29 Andersen, Chris 35 Heat F/C 11 $1,399,507 82 228 12 2001 7/7/1978 Blinn College Long Beach, CA California US White No
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
498 Paul, Chris 28 Clippers G 3 $18,668,431 72 175 8 2005 5/6/1985 Wake Forest Forsyth County, NC North Carolina US Black No
499 Teague, Jeff 25 Hawks G 0 $8,000,000 74 181 4 2009 6/10/1988 Wake Forest Indianapolis, IN Indiana US Black No
500 Smith, Ish 25 Suns G 30 $951,463 72 175 3 2010 7/5/1988 Wake Forest Charlotte, NC North Carolina US Black No
501 Duncan, Tim 37 Spurs F/C 21 $10,361,446 83 255 16 1997 4/25/1976 Wake Forest Christiansted, VI Virgin Islands Virgin Islands Black No
502 Hawes, Spencer 25 76ers C 0 $6,500,000 85 245 6 2007 4/28/1988 Washington Seattle, WA Washington US White No
503 Wroten, Tony 20 76ers G 8 $1,160,040 78 205 1 2012 4/13/1993 Washington Renton, WA Washington US Black No
504 Gaddy, Abdul 21 Bobcats G 10 n/a 75 185 0 2013 1/26/1992 Washington Tacoma, WA Washington US Black No
505 Thomas, Isaiah 24 Kings G 22 $884,293 69 185 2 2011 2/7/1989 Washington Tacoma, WA Washington US Black No
506 Robinson, Nate 29 Nuggets G 10 $2,016,000 69 180 8 2005 5/31/1984 Washington Seattle, WA Washington US Black No
507 Ross, Terrence 22 Raptors G 31 $2,678,640 78 195 1 2012 2/5/1991 Washington Portland, OR Oregon US Black No
508 Pondexter, Quincy 25 Grizzlies G/F 20 $225,479 78 225 3 2010 3/10/1988 Washington Fresno, CA California US Black No
509 Holiday, Justin 24 Jazz G/F 22 $788,872 78 185 0 2013 4/5/1989 Washington Mission Hills, CA California US Black No
510 Baynes, Aron 26 Spurs F/C 16 $788,872 82 260 0 2013 12/9/1986 Washington State Gisborne, NZ n/a New Zealand White No
511 Thompson, Klay 23 Warriors G/F 11 $2,317,920 79 205 2 2011 2/8/1990 Washington State Los Angeles, CA California US Mixed No
512 Lillard, Damian 23 Trail Blazers G 0 $3,202,920 75 195 1 2012 7/15/1990 Weber State Oakland, CA California US Black No
513 Alexander, Joe 26 Warriors F 25 $854,389 80 230 5 2008 12/26/1986 West Virginia Kaohsiung, TA n/a Taiwan White No
514 Fischer, D'or 32 Wizards C 21 n/a 83 255 0 2013 10/12/1981 West Virginia Philadelphia, PA Pennsylvania US Black No
515 Ebanks, Devin 23 Mavericks F 37 $884,293 81 215 3 2010 10/28/1989 West Virginia New York City, NY New York US Black No
516 Johnson, Amir 26 Raptors F/C 15 $6,500,000 81 210 8 2005 5/1/1987 Westchester HS (CA) Los Angeles, CA California US Black Yes
517 Martin, Kevin 30 Timberwolves G 23 $6,500,000 79 185 9 2004 2/1/1983 Western Carolina Zanesville, OH Ohio US Mixed No
518 Evans, Jeremy 25 Jazz F 40 $1,660,257 81 194 3 2010 10/24/1987 Western Kentucky Crossett, AR Arkansas US Black No
519 Lee, Courtney 28 Celtics G/F 11 $5,225,000 77 200 5 2008 10/3/1985 Western Kentucky Indianapolis, IN Indiana US Black No
520 Mekel, Gal 25 Mavericks G 33 $490,180 75 191 5 2008 3/4/1988 Wichita State Petah Tikva n/a Israel White No
521 Murry, Toure' 23 Knicks G/F 23 $490,180 77 195 0 2013 11/8/1989 Wichita State Houston, TX Texas US Black No
522 Stiemsma, Greg 28 Pelicans C 34 $2,676,000 83 260 2 2011 9/26/1985 Wisconsin Randolph, WI Wisconsin US White No
523 Leuer, Jon 24 Grizzlies F 30 $900,000 82 228 2 2011 5/14/1989 Wisconsin Long Lake, MN Minnesota US White No
524 Landry, Marcus 27 Lakers F 14 $788,872 79 225 17 1996 11/1/1985 Wisconsin Milwaukee, WI Wisconsin US Black No
525 Harris, Devin 30 Mavericks G 20 $854,389 75 192 9 2004 2/27/1983 Wisconsin Milwaukee, WI Wisconsin US Black No
526 West, David 33 Pacers F 21 $12,000,000 81 250 10 2003 8/29/1980 Xavier Teaneck, NJ New Jersey US Black No
527 Crawford, Jordan 24 Celtics G 27 $2,162,419 76 195 3 2010 10/23/1988 Xavier Detroit, MI Michigan US Black No

528 rows × 17 columns

If we scroll we can see all of it. But maybe we don't want to see all of it. Maybe we hate scrolling?


In [5]:
# Look at the first few rows
df.head() #shows first 5 rows


Out[5]:
Name Age Team POS # 2013 $ Ht (In.) WT EXP 1st Year DOB School City State (Province, Territory, Etc..) Country Race HS Only
0 Gee, Alonzo 26 Cavaliers F 33 $3,250,000 78 219 4 2009 5/29/1987 Alabama Riviera Beach, FL Florida US Black No
1 Wallace, Gerald 31 Celtics F 45 $10,105,855 79 220 12 2001 7/23/1982 Alabama Sylacauga, AL Alabama US Black No
2 Williams, Mo 30 Trail Blazers G 25 $2,652,000 73 195 10 2003 12/19/1982 Alabama Jackson, MS Mississippi US Black No
3 Gladness, Mickell 27 Magic C 40 $762,195 83 220 2 2011 7/26/1986 Alabama A&M Birmingham, AL Alabama US Black No
4 Jefferson, Richard 33 Jazz F 44 $11,046,000 79 230 12 2001 6/21/1980 Arizona Los Angeles, CA California US Black No

...but maybe we want to see more than a measly five results?


In [6]:
# Let's look at MORE of the first few rows
df.head(10)


Out[6]:
Name Age Team POS # 2013 $ Ht (In.) WT EXP 1st Year DOB School City State (Province, Territory, Etc..) Country Race HS Only
0 Gee, Alonzo 26 Cavaliers F 33 $3,250,000 78 219 4 2009 5/29/1987 Alabama Riviera Beach, FL Florida US Black No
1 Wallace, Gerald 31 Celtics F 45 $10,105,855 79 220 12 2001 7/23/1982 Alabama Sylacauga, AL Alabama US Black No
2 Williams, Mo 30 Trail Blazers G 25 $2,652,000 73 195 10 2003 12/19/1982 Alabama Jackson, MS Mississippi US Black No
3 Gladness, Mickell 27 Magic C 40 $762,195 83 220 2 2011 7/26/1986 Alabama A&M Birmingham, AL Alabama US Black No
4 Jefferson, Richard 33 Jazz F 44 $11,046,000 79 230 12 2001 6/21/1980 Arizona Los Angeles, CA California US Black No
5 Hill, Solomon 22 Pacers F 9 $1,246,680 79 220 0 2013 3/18/1991 Arizona Los Angeles, CA California US Black No
6 Budinger, Chase 25 Timberwolves F 10 $5,000,000 79 218 4 2009 5/22/1988 Arizona Encinitas, CA California US White No
7 Williams, Derrick 22 Timberwolves F 7 $5,016,960 80 241 2 2011 5/25/1991 Arizona La Mirada, CA California US Black No
8 Hill, Jordan 26 Lakers F/C 27 $3,563,600 82 235 1 2012 7/27/1987 Arizona Newberry, SC South Carolina US Black No
9 Frye, Channing 30 Suns F/C 8 $6,500,000 83 245 8 2005 5/17/1983 Arizona White Plains, NY New York US Black No

But maybe we want to make a basketball joke and see the final four?


In [7]:
# Let's look at the final few rows
df.tail(4)


Out[7]:
Name Age Team POS # 2013 $ Ht (In.) WT EXP 1st Year DOB School City State (Province, Territory, Etc..) Country Race HS Only
524 Landry, Marcus 27 Lakers F 14 $788,872 79 225 17 1996 11/1/1985 Wisconsin Milwaukee, WI Wisconsin US Black No
525 Harris, Devin 30 Mavericks G 20 $854,389 75 192 9 2004 2/27/1983 Wisconsin Milwaukee, WI Wisconsin US Black No
526 West, David 33 Pacers F 21 $12,000,000 81 250 10 2003 8/29/1980 Xavier Teaneck, NJ New Jersey US Black No
527 Crawford, Jordan 24 Celtics G 27 $2,162,419 76 195 3 2010 10/23/1988 Xavier Detroit, MI Michigan US Black No

So yes, head and tail work kind of like the terminal commands. That's nice, I guess.

But maybe we're incredibly demanding (which we are) and we want, say, the 6th through the 8th row (which we do). Don't worry (which I know you were), we can do that, too.


In [8]:
# Show the 6th through the 8th rows
df[5:8]


Out[8]:
Name Age Team POS # 2013 $ Ht (In.) WT EXP 1st Year DOB School City State (Province, Territory, Etc..) Country Race HS Only
5 Hill, Solomon 22 Pacers F 9 $1,246,680 79 220 0 2013 3/18/1991 Arizona Los Angeles, CA California US Black No
6 Budinger, Chase 25 Timberwolves F 10 $5,000,000 79 218 4 2009 5/22/1988 Arizona Encinitas, CA California US White No
7 Williams, Derrick 22 Timberwolves F 7 $5,016,960 80 241 2 2011 5/25/1991 Arizona La Mirada, CA California US Black No

It's kind of like an array, right? Except where in an array we'd say df[0] this time we need to give it two numbers, the start and the end.

Selecting columns

But jeez, my eyes don't want to go that far over the data. I only want to see, uh, name and age.


In [9]:
# Get the names of the columns, just because
#columns_we_want = ['Name', 'Age']
#df[columns_we_want]

In [10]:
# If we want to be "correct" we add .values on the end of it
df.columns


Out[10]:
Index(['Name', 'Age', 'Team', 'POS', '#', '2013 $', 'Ht (In.)', 'WT', 'EXP',
       '1st Year', 'DOB', 'School', 'City',
       'State (Province, Territory, Etc..)', 'Country', 'Race', 'HS Only'],
      dtype='object')

In [11]:
# Select only name and age

In [12]:
# Combing that with .head() to see not-so-many rows
columns_we_want = ['Name', 'Age']
df[columns_we_want].head()


Out[12]:
Name Age
0 Gee, Alonzo 26
1 Wallace, Gerald 31
2 Williams, Mo 30
3 Gladness, Mickell 27
4 Jefferson, Richard 33

In [13]:
# We can also do this all in one line, even though it starts looking ugly
# (unlike the cute bears pandas looks ugly pretty often)
df[['Name', 'Age',]].head()


Out[13]:
Name Age
0 Gee, Alonzo 26
1 Wallace, Gerald 31
2 Williams, Mo 30
3 Gladness, Mickell 27
4 Jefferson, Richard 33

NOTE: That was not df['Name', 'Age'], it was df[['Name', 'Age']]. You'll definitely type it wrong all of the time. When things break with pandas it's probably because you forgot to put in a million brackets.

Describing your data

A powerful tool of pandas is being able to select a portion of your data, because who ordered all that data anyway.


In [14]:
df.head()


Out[14]:
Name Age Team POS # 2013 $ Ht (In.) WT EXP 1st Year DOB School City State (Province, Territory, Etc..) Country Race HS Only
0 Gee, Alonzo 26 Cavaliers F 33 $3,250,000 78 219 4 2009 5/29/1987 Alabama Riviera Beach, FL Florida US Black No
1 Wallace, Gerald 31 Celtics F 45 $10,105,855 79 220 12 2001 7/23/1982 Alabama Sylacauga, AL Alabama US Black No
2 Williams, Mo 30 Trail Blazers G 25 $2,652,000 73 195 10 2003 12/19/1982 Alabama Jackson, MS Mississippi US Black No
3 Gladness, Mickell 27 Magic C 40 $762,195 83 220 2 2011 7/26/1986 Alabama A&M Birmingham, AL Alabama US Black No
4 Jefferson, Richard 33 Jazz F 44 $11,046,000 79 230 12 2001 6/21/1980 Arizona Los Angeles, CA California US Black No

I want to know how many people are in each position. Luckily, pandas can tell me!


In [15]:
# Grab the POS column, and count the different values in it.
df['POS'].value_counts()


Out[15]:
G      175
F      142
F/C     74
G/F     70
C       67
Name: POS, dtype: int64

Now that was a little weird, yes - we used df['POS'] instead of df[['POS']] when viewing the data's details.

But now I'm curious about numbers: how old is everyone? Maybe we could, I don't know, get some statistics about age? Some statistics to describe age?


In [16]:
#race
race_counts = df['Race'].value_counts()
race_counts


Out[16]:
Black       399
White        95
Mixed        16
Hispanic     16
Asian         1
Name: Race, dtype: int64

In [17]:
# Summary statistics for Age
df['Age'].describe()


Out[17]:
count    528.000000
mean      26.242424
std        4.178868
min       18.000000
25%       23.000000
50%       25.000000
75%       29.000000
max       39.000000
Name: Age, dtype: float64

In [18]:
df.describe()


Out[18]:
Age Ht (In.) WT EXP 1st Year
count 528.000000 528.000000 528.000000 528.000000 528.000000
mean 26.242424 79.119318 221.206439 4.772727 2008.227273
std 4.178868 3.431488 27.943169 4.325628 4.325628
min 18.000000 69.000000 20.000000 0.000000 1995.000000
25% 23.000000 77.000000 200.000000 1.000000 2005.000000
50% 25.000000 80.000000 220.000000 4.000000 2009.000000
75% 29.000000 82.000000 240.000000 8.000000 2012.000000
max 39.000000 87.000000 290.000000 18.000000 2013.000000

In [19]:
# That's pretty good. Does it work for everything? How about the money?
df['2013 $'].describe()
#The result is the result, because the Money is a string.


Out[19]:
count     528
unique    308
top       n/a
freq       43
Name: 2013 $, dtype: object

Unfortunately because that has dollar signs and commas it's thought of as a string. We'll fix it in a second, but let's try describing one more thing.


In [20]:
# Doing more describing
df['Ht (In.)'].describe()


Out[20]:
count    528.000000
mean      79.119318
std        3.431488
min       69.000000
25%       77.000000
50%       80.000000
75%       82.000000
max       87.000000
Name: Ht (In.), dtype: float64

That's stupid, though, what's an inch even look like? What's 80 inches? I don't have a clue. If only there were some wa to manipulate our data.

Manipulating data

Oh wait there is, HA HA HA.


In [21]:
# Take another look at our inches, but only the first few
df['Ht (In.)'].head()


Out[21]:
0    78
1    79
2    73
3    83
4    79
Name: Ht (In.), dtype: int64

In [22]:
# Divide those inches by 12
#number_of_inches = 300
#number_of_inches / 12
df['Ht (In.)'].head() / 12


Out[22]:
0    6.500000
1    6.583333
2    6.083333
3    6.916667
4    6.583333
Name: Ht (In.), dtype: float64

In [23]:
# Let's divide ALL of them by 12
df['Ht (In.)'] / 12


Out[23]:
0      6.500000
1      6.583333
2      6.083333
3      6.916667
4      6.583333
5      6.583333
6      6.583333
7      6.666667
8      6.833333
9      6.916667
10     6.250000
11     6.166667
12     6.250000
13     6.500000
14     6.833333
15     6.666667
16     6.750000
17     6.416667
18     6.500000
19     6.083333
20     6.083333
21     6.583333
22     6.583333
23     6.083333
24     6.750000
25     6.583333
26     6.916667
27     6.833333
28     6.250000
29     6.833333
         ...   
498    6.000000
499    6.166667
500    6.000000
501    6.916667
502    7.083333
503    6.500000
504    6.250000
505    5.750000
506    5.750000
507    6.500000
508    6.500000
509    6.500000
510    6.833333
511    6.583333
512    6.250000
513    6.666667
514    6.916667
515    6.750000
516    6.750000
517    6.583333
518    6.750000
519    6.416667
520    6.250000
521    6.416667
522    6.916667
523    6.833333
524    6.583333
525    6.250000
526    6.750000
527    6.333333
Name: Ht (In.), dtype: float64

In [24]:
# Can we get statistics on those?
height_in_feet = df['Ht (In.)'] / 12
height_in_feet.describe()


Out[24]:
count    528.000000
mean       6.593277
std        0.285957
min        5.750000
25%        6.416667
50%        6.666667
75%        6.833333
max        7.250000
Name: Ht (In.), dtype: float64

In [25]:
# Let's look at our original data again
df.head(3)


Out[25]:
Name Age Team POS # 2013 $ Ht (In.) WT EXP 1st Year DOB School City State (Province, Territory, Etc..) Country Race HS Only
0 Gee, Alonzo 26 Cavaliers F 33 $3,250,000 78 219 4 2009 5/29/1987 Alabama Riviera Beach, FL Florida US Black No
1 Wallace, Gerald 31 Celtics F 45 $10,105,855 79 220 12 2001 7/23/1982 Alabama Sylacauga, AL Alabama US Black No
2 Williams, Mo 30 Trail Blazers G 25 $2,652,000 73 195 10 2003 12/19/1982 Alabama Jackson, MS Mississippi US Black No

Okay that was nice but unfortunately we can't do anything with it. It's just sitting there, separate from our data. If this were normal code we could do blahblah['feet'] = blahblah['Ht (In.)'] / 12, but since this is pandas, we can't. Right? Right?


In [26]:
# Store a new column
df['feet'] = df['Ht (In.)'] / 12
df.head()


Out[26]:
Name Age Team POS # 2013 $ Ht (In.) WT EXP 1st Year DOB School City State (Province, Territory, Etc..) Country Race HS Only feet
0 Gee, Alonzo 26 Cavaliers F 33 $3,250,000 78 219 4 2009 5/29/1987 Alabama Riviera Beach, FL Florida US Black No 6.500000
1 Wallace, Gerald 31 Celtics F 45 $10,105,855 79 220 12 2001 7/23/1982 Alabama Sylacauga, AL Alabama US Black No 6.583333
2 Williams, Mo 30 Trail Blazers G 25 $2,652,000 73 195 10 2003 12/19/1982 Alabama Jackson, MS Mississippi US Black No 6.083333
3 Gladness, Mickell 27 Magic C 40 $762,195 83 220 2 2011 7/26/1986 Alabama A&M Birmingham, AL Alabama US Black No 6.916667
4 Jefferson, Richard 33 Jazz F 44 $11,046,000 79 230 12 2001 6/21/1980 Arizona Los Angeles, CA California US Black No 6.583333

That's cool, maybe we could do the same thing with their salary? Take out the $ and the , and convert it to an integer?


In [27]:
# Can't just use .replace

In [28]:
# Need to use this weird .str thing

In [29]:
# Can't just immediately replace the , either

In [30]:
# Need to use the .str thing before EVERY string method

In [31]:
# Describe still doesn't work.

In [32]:
# Let's convert it to an integer using .astype(int) before we describe it

In [ ]:


In [33]:
# Maybe we can just make them millions?

In [34]:
# Unfortunately one is "n/a" which is going to break our code, so we can make n/a be 0

In [35]:
# Remove the .head() piece and save it back into the dataframe

In [ ]:

The average basketball player makes 3.8 million dollars and is a little over six and a half feet tall.

But who cares about those guys? I don't care about those guys. They're boring. I want the real rich guys!

Sorting and sub-selecting


In [36]:
# This is just the first few guys in the dataset. Can we order it?

In [37]:
# Let's try to sort them, ascending value

df.sort_values('feet')


Out[37]:
Name Age Team POS # 2013 $ Ht (In.) WT EXP 1st Year DOB School City State (Province, Territory, Etc..) Country Race HS Only feet
506 Robinson, Nate 29 Nuggets G 10 $2,016,000 69 180 8 2005 5/31/1984 Washington Seattle, WA Washington US Black No 5.750000
505 Thomas, Isaiah 24 Kings G 22 $884,293 69 185 2 2011 2/7/1989 Washington Tacoma, WA Washington US Black No 5.750000
235 Larkin, Shane 21 Mavericks G 3 $1,536,960 71 176 0 2013 10/2/1992 Miami (FL) Cincinnati, OH Ohio US Black No 5.916667
362 Lucas III, John 30 Jazz G 5 $1,600,000 71 165 8 2005 11/21/1982 Oklahoma State Washington, DC DC US Black No 5.916667
256 Pressey, Phil 22 Celtics G 26 $490,180 71 175 0 2013 2/17/1991 Missouri Dallas, TX Texas US Black No 5.916667
336 Lawson, Ty 25 Nuggets G 3 $10,786,517 71 195 4 2009 11/3/1987 North Carolina Clinton, MD Maryland US Black No 5.916667
388 McConnell, Mickey 24 Mavericks G 32 $490,180 72 189 2 2011 4/14/1989 St. Mary's (CA) Mesa, AZ Arizona US White No 6.000000
498 Paul, Chris 28 Clippers G 3 $18,668,431 72 175 8 2005 5/6/1985 Wake Forest Forsyth County, NC North Carolina US Black No 6.000000
133 Bynum, Will 30 Pistons G 12 $2,790,343 72 185 8 2005 1/4/1983 Georgia Tech Chicago, IL Illinois US Black No 6.000000
500 Smith, Ish 25 Suns G 30 $951,463 72 175 3 2010 7/5/1988 Wake Forest Charlotte, NC North Carolina US Black No 6.000000
387 Mills, Patrick 25 Spurs G 8 $1,133,950 72 185 4 2009 8/11/1988 St. Mary's (CA) Canberra New South Wales Australia Black No 6.000000
489 Lowry, Kyle 27 Raptors G 7 $6,210,000 72 205 7 2006 3/25/1986 Villanova Philadelphia, PA Pennsylvania US Black No 6.000000
258 Canaan, Isaiah 22 Rockets G 1 $570,515 72 188 0 2013 5/2/1991 Murray State Biloxi, MS Mississippi US Black No 6.000000
450 Augustin, D. J. 25 Raptors G 14 $1,267,000 72 183 5 2008 11/10/1987 Texas New Orleans, LA Louisiana US Black No 6.000000
368 Brooks, Aaron 28 Rockets G 0 $884,293 72 161 6 2007 1/14/1985 Oregon Seattle, WA Washington US Black No 6.000000
202 Siva, Peyton 22 Pistons G 34 $490,180 72 185 0 2013 10/24/1990 Louisville Seattle, WA Washington US Hispanic No 6.000000
347 Barea, JosÈ Juan 29 Timberwolves G 11 $4,687,000 72 175 7 2006 6/26/1984 Northeastern Mayaguez n/a Puerto Rico Hispanic No 6.000000
464 Collison, Darren 26 Clippers G 2 $1,900,000 72 175 4 2009 8/23/1987 UCLA Rancho Cucamonga, CA California US Black No 6.000000
385 Nelson, Jameer 31 Magic G 14 $8,600,000 72 190 9 2004 2/9/1982 Saint Joseph's Chester, PA Pennsylvania US Black No 6.000000
240 Burke, Trey 20 Jazz G 3 $2,438,760 73 185 0 2013 11/12/1992 Michigan Columbus, OH Ohio US Black No 6.083333
62 Walker, Kemba 23 Bobcats G 15 $2,568,360 73 184 2 2011 5/8/1990 Connecticut New York City, NY New York US Black No 6.083333
152 Machado, Scott 23 Jazz G 30 $788,872 73 205 1 2012 6/8/1990 Iona New York City, NY New York US Black No 6.083333
470 Watson, Earl 34 Trail Blazers G 17 $884,293 73 199 12 2001 6/12/1979 UCLA Kansas City, KA Kansas US Black No 6.083333
71 Roberts, Brian 27 Pelicans G 22 $788,872 73 173 1 2012 12/3/1985 Dayton Toledo, OH Ohio US Black No 6.083333
487 Wayns, Maalik 22 Clippers G 5 $788,872 73 195 1 2012 5/2/1991 Villanova Philadelphia, PA Pennsylvania US Black No 6.083333
351 Jennings, Brandon 24 Pistons G 7 $7,655,503 73 169 4 2009 9/23/1989 Oak Hill Academy (VA) Lakewood, CA California US Black Yes 6.083333
356 Conley, Mike 26 Grizzlies G 11 $8,600,001 73 185 6 2007 10/11/1987 Ohio State Fayetteville, AR Arkansas US Black No 6.083333
23 Fisher, Derek 39 Thunder G 6 $884,293 73 210 17 1996 8/9/1974 Arkansas-Little Rock Little Rock, AR Arkansas US Black No 6.083333
284 Schrˆder, Dennis 20 Hawks G 17 $1,348,200 73 165 0 2013 9/15/1993 n/a Braunschweig, LS Lower Saxony Germany Black No 6.083333
2 Williams, Mo 30 Trail Blazers G 25 $2,652,000 73 195 10 2003 12/19/1982 Alabama Jackson, MS Mississippi US Black No 6.083333
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
405 Dedmon, Dewayne 24 Warriors C -- n/a 84 255 0 2013 8/12/1989 Southern California Lancaster, CA California US Black No 7.000000
52 Smith, Jason 27 Pelicans F/C 14 $2,500,000 84 240 6 2007 3/2/1986 Colorado State Greeley, CO Colorado US White No 7.000000
461 Hollins, Ryan 29 Clippers C 15 $884,293 84 240 7 2006 10/10/1984 UCLA Pasadena, CA California US Black No 7.000000
140 Sacre, Robert 24 Lakers C 50 $788,872 84 260 6 2007 6/6/1989 Gonzaga Baton Rouge, LA Louisiana US Mixed No 7.000000
282 Nowitzki, Dirk 35 Mavericks F 41 $22,721,381 84 245 15 1998 6/19/1978 n/a Wurzburg, BA Bavaria Germany White No 7.000000
41 Kaman, Chris 31 Lakers C 9 $3,183,000 84 265 4 2009 4/28/1982 Central Michigan Grand Rapids, MI Michigan US White No 7.000000
426 Melo, Fab 23 Mavericks C 51 $788,872 84 255 1 2012 1/20/1990 Syracuse Juiz de For a, MG Minas Gerais Brazil Black No 7.000000
292 Motiej?nas, Donatas 23 Rockets F/C 20 $1,422,720 84 222 1 2012 9/20/1990 n/a Kaunas n/a Lithuania White No 7.000000
149 Zeller, Cody 21 Bobcats F/C 40 $3,857,040 84 240 0 2013 10/5/1992 Indiana Washington, IN Indiana US White No 7.000000
421 Lopez, Robin 25 Trail Blazers C 42 $5,904,261 84 255 5 2008 4/8/1988 Stanford Los Angeles, CA California US Mixed No 7.000000
404 Vu?evi?, Nikola 22 Magic C 9 $1,793,520 84 240 2 2011 10/24/1990 Southern Califoria Morges, VA Vaud Switzerland White No 7.000000
420 Lopez, Brook 25 Nets C 11 $14,693,906 84 265 5 2008 4/1/1988 Stanford Los Angeles, CA California US Mixed No 7.000000
479 Bogut, Andrew 28 Warriors C 12 $14,000,000 84 260 8 2005 11/28/1984 Utah Melbourne, VI Victoria Australia White No 7.000000
159 Withey, Jeff 23 Pelicans C 5 $490,180 84 235 0 2013 3/7/1990 Kansas San Diego, CA California US White No 7.000000
305 Gasol, Pau 33 Lakers F/C 16 $19,285,850 84 250 3 2010 7/6/1980 n/a Barcelona, Spain n/a Spain Hispanic No 7.000000
296 Biedri?ö, Andris 27 Jazz C 11 $9,000,000 84 242 9 2004 4/2/1986 n/a Riga n/a Russia White No 7.000000
32 O'Bryant, Patrick 27 Bobcats C 18 n/a 84 250 7 2006 6/20/1986 Bradley Oskaloosa, IA Iowa US Black No 7.000000
137 Olynyk, Kelly 22 Celtics F/C 41 $1,986,360 84 238 0 2013 4/19/1991 Gonzaga Toronto, ON Ontario Canada White No 7.000000
295 Bargnani, Andrea 28 Knicks F/C 77 $11,862,500 84 250 11 2002 8/26/1985 n/a Rome, LZ Lazio Rome White No 7.000000
416 Bynum, Andrew 25 Cavaliers C 21 $12,250,000 84 285 8 2005 10/27/1987 St. Joseph HS (NJ) Plainsboro, NJ New Jersey US Black Yes 7.000000
297 Mozgov, Timofey 27 Nuggets C 25 $4,400,000 85 250 3 2010 7/16/1986 n/a St. Petersburg n/a Russia White No 7.083333
145 Leonard, Meyers 21 Trail Blazers C 11 $2,222,160 85 245 1 2012 2/27/1992 Illinois Robinson, IIL Illinois US White No 7.083333
76 Chandler, Tyson 31 Knicks C 6 $14,100,538 85 240 12 2001 10/2/1982 Dominguez HS (CA) Hanford, CA California US Black Yes 7.083333
303 Gasol, Marc 28 Grizzlies C 33 $14,860,524 85 265 5 2008 1/29/1985 n/a Barcelona n/a Spain Hispanic No 7.083333
274 Gobert, Rudy 21 Jazz C 27 $1,078,800 85 235 0 2013 6/26/1992 n/a Saint-Quentin Aisne France Mixed No 7.083333
221 Len, Alex 20 Suns C 21 $3,492,720 85 255 0 2013 6/16/1993 Maryland Antratsy n/a Ukraine White No 7.083333
316 Kuzmi?, Ognjen 23 Warriors C 1 $490,180 85 231 0 2013 5/16/1990 n/a Doboj n/a Yugoslavia White No 7.083333
502 Hawes, Spencer 25 76ers C 0 $6,500,000 85 245 6 2007 4/28/1988 Washington Seattle, WA Washington US White No 7.083333
120 Hibbert, Roy 26 Pacers C 55 $14,283,844 86 280 5 2008 12/11/1986 Georgetown New York City, NY New York US Black No 7.166667
54 Thabeet, Hasheem 26 Thunder C 34 $1,200,000 87 263 4 2009 2/16/1987 Connecticut Dar es Salaam n/a Tanzania Black No 7.250000

528 rows × 18 columns

Those guys are making nothing! If only there were a way to sort from high to low, a.k.a. descending instead of ascending.


In [38]:
# It isn't descending = True, unfortunately
df.sort_values('feet', ascending=False).head()


Out[38]:
Name Age Team POS # 2013 $ Ht (In.) WT EXP 1st Year DOB School City State (Province, Territory, Etc..) Country Race HS Only feet
54 Thabeet, Hasheem 26 Thunder C 34 $1,200,000 87 263 4 2009 2/16/1987 Connecticut Dar es Salaam n/a Tanzania Black No 7.250000
120 Hibbert, Roy 26 Pacers C 55 $14,283,844 86 280 5 2008 12/11/1986 Georgetown New York City, NY New York US Black No 7.166667
502 Hawes, Spencer 25 76ers C 0 $6,500,000 85 245 6 2007 4/28/1988 Washington Seattle, WA Washington US White No 7.083333
145 Leonard, Meyers 21 Trail Blazers C 11 $2,222,160 85 245 1 2012 2/27/1992 Illinois Robinson, IIL Illinois US White No 7.083333
303 Gasol, Marc 28 Grizzlies C 33 $14,860,524 85 265 5 2008 1/29/1985 n/a Barcelona n/a Spain Hispanic No 7.083333

In [39]:
# We can use this to find the oldest guys in the league
df.sort_values('Age', ascending=False).head()


Out[39]:
Name Age Team POS # 2013 $ Ht (In.) WT EXP 1st Year DOB School City State (Province, Territory, Etc..) Country Race HS Only feet
392 Nash, Steve 39 Lakers G 10 $9,300,500 75 178 7 2006 2/7/1974 Santa Clara Johannesburg, SA n/a South Africa White No 6.250000
225 Camby, Marcus 39 Rockets F/C 21 $884,293 83 240 17 1996 3/22/1974 Massachusetts Hartford, CT Connecticut US Black No 6.916667
23 Fisher, Derek 39 Thunder G 6 $884,293 73 210 17 1996 8/9/1974 Arkansas-Little Rock Little Rock, AR Arkansas US Black No 6.083333
63 Allen, Ray 38 Heat G 34 $3,229,050 77 205 17 1996 7/20/1975 Connecticut Merced, CA California US Black No 6.416667
94 James, Mike 38 Bulls G 8 n/a 74 188 15 1998 6/23/1975 Duquesne Copiague, NY New York US Black No 6.166667

In [40]:
# Or the youngest, by taking out 'ascending=False'
df.sort_values('feet').head()


Out[40]:
Name Age Team POS # 2013 $ Ht (In.) WT EXP 1st Year DOB School City State (Province, Territory, Etc..) Country Race HS Only feet
506 Robinson, Nate 29 Nuggets G 10 $2,016,000 69 180 8 2005 5/31/1984 Washington Seattle, WA Washington US Black No 5.750000
505 Thomas, Isaiah 24 Kings G 22 $884,293 69 185 2 2011 2/7/1989 Washington Tacoma, WA Washington US Black No 5.750000
235 Larkin, Shane 21 Mavericks G 3 $1,536,960 71 176 0 2013 10/2/1992 Miami (FL) Cincinnati, OH Ohio US Black No 5.916667
362 Lucas III, John 30 Jazz G 5 $1,600,000 71 165 8 2005 11/21/1982 Oklahoma State Washington, DC DC US Black No 5.916667
256 Pressey, Phil 22 Celtics G 26 $490,180 71 175 0 2013 2/17/1991 Missouri Dallas, TX Texas US Black No 5.916667

But sometimes instead of just looking at them, I want to do stuff with them. Play some games with them! Dunk on them~ describe them! And we don't want to dunk on everyone, only the players above 7 feet tall.

First, we need to check out boolean things.


In [41]:
# Get a big long list of True and False for every single row.
df['feet'] > 6.5


Out[41]:
0      False
1       True
2      False
3       True
4       True
5       True
6       True
7       True
8       True
9       True
10     False
11     False
12     False
13     False
14      True
15      True
16      True
17     False
18     False
19     False
20     False
21      True
22      True
23     False
24      True
25      True
26      True
27      True
28     False
29      True
       ...  
498    False
499    False
500    False
501     True
502     True
503    False
504    False
505    False
506    False
507    False
508    False
509    False
510     True
511     True
512    False
513     True
514     True
515     True
516     True
517     True
518     True
519    False
520    False
521    False
522     True
523     True
524     True
525    False
526     True
527    False
Name: feet, dtype: bool

In [42]:
# We could use value counts if we wanted
above_or_below_six_five = df['feet'] > 6.5
above_or_below_six_five.value_counts()


Out[42]:
True     317
False    211
Name: feet, dtype: int64

In [43]:
# But we can also apply this to every single row to say whether YES we want it or NO we don't

In [44]:
# Instead of putting column names inside of the brackets, we instead
# put the True/False statements. It will only return the players above 
# seven feet tall

df[df['feet'] > 6.5]


Out[44]:
Name Age Team POS # 2013 $ Ht (In.) WT EXP 1st Year DOB School City State (Province, Territory, Etc..) Country Race HS Only feet
1 Wallace, Gerald 31 Celtics F 45 $10,105,855 79 220 12 2001 7/23/1982 Alabama Sylacauga, AL Alabama US Black No 6.583333
3 Gladness, Mickell 27 Magic C 40 $762,195 83 220 2 2011 7/26/1986 Alabama A&M Birmingham, AL Alabama US Black No 6.916667
4 Jefferson, Richard 33 Jazz F 44 $11,046,000 79 230 12 2001 6/21/1980 Arizona Los Angeles, CA California US Black No 6.583333
5 Hill, Solomon 22 Pacers F 9 $1,246,680 79 220 0 2013 3/18/1991 Arizona Los Angeles, CA California US Black No 6.583333
6 Budinger, Chase 25 Timberwolves F 10 $5,000,000 79 218 4 2009 5/22/1988 Arizona Encinitas, CA California US White No 6.583333
7 Williams, Derrick 22 Timberwolves F 7 $5,016,960 80 241 2 2011 5/25/1991 Arizona La Mirada, CA California US Black No 6.666667
8 Hill, Jordan 26 Lakers F/C 27 $3,563,600 82 235 1 2012 7/27/1987 Arizona Newberry, SC South Carolina US Black No 6.833333
9 Frye, Channing 30 Suns F/C 8 $6,500,000 83 245 8 2005 5/17/1983 Arizona White Plains, NY New York US Black No 6.916667
14 Boateng, Eric 27 Lakers C 12 n/a 82 257 17 1996 11/20/1985 Arizona State London, ENG n/a England Black No 6.833333
15 Diogu, Ike 29 Knicks F/C 50 $792,377 80 255 8 2005 11/9/1983 Arizona State Buffalo, NY New York US Black No 6.666667
16 Ayres, Jeff 26 Spurs F/C 11 $1,750,000 81 250 4 2009 4/29/1987 Arizona State Ontario, CA California US Black No 6.750000
21 Johnson, Joe 32 Nets G/F 7 $21,466,718 79 240 12 2001 6/29/1981 Arkansas Little Rock, AR Arkansas US Black No 6.583333
22 Brewer, Ronnie 28 Rockets G/F 10 $1,186,459 79 235 7 2006 3/20/1985 Arkansas Portland, OR Oregon US Black No 6.583333
24 Miller, Quincy 20 Nuggets F 30 $788,872 81 210 1 2012 11/18/1992 Baylor North Carolina, IL Illinois US Black No 6.750000
25 Acy, Quincy 23 Raptors F 4 $788,872 79 225 1 2012 10/6/1990 Baylor Tyler, TX Texas US Black No 6.583333
26 Jones, Perry 22 Thunder F 3 $1,082,520 83 235 1 2012 9/24/1991 Baylor Winnsboro, LA Louisiana US Black No 6.916667
27 Udoh, Ekpe 26 Bucks F/C 5 $4,469,548 82 245 3 2010 5/20/1987 Baylor Edmond, OK Oklahoma US Black No 6.833333
29 Andersen, Chris 35 Heat F/C 11 $1,399,507 82 228 12 2001 7/7/1978 Blinn College Long Beach, CA California US White No 6.833333
31 Dudley, Jared 28 Clippers G/F 9 $4,250,000 79 225 6 2007 7/10/1985 Boston College San Diego, CA California US Black No 6.583333
32 O'Bryant, Patrick 27 Bobcats C 18 n/a 84 250 7 2006 6/20/1986 Bradley Oskaloosa, IA Iowa US Black No 7.000000
33 Davies, Brandon 22 Clippers F 23 n/a 81 235 0 2013 7/25/1991 Brigham Young Provo, UT Utah US Black No 6.750000
36 Hayward, Gordon 23 Jazz G/F 20 $3,452,183 80 215 3 2010 3/23/1990 Butler Indianapolis, IN Indiana US White No 6.666667
37 Anderson, Ryan 25 Pelicans F 33 $8,308,500 82 240 5 2008 5/6/1988 California Sacramento, CA California US White No 6.833333
39 Griffin, Eric 23 Heat F 17 $490,180 80 197 0 2013 5/26/1990 Campbell Orlando, FL Florida US Black No 6.666667
41 Kaman, Chris 31 Lakers C 9 $3,183,000 84 265 4 2009 4/28/1982 Central Michigan Grand Rapids, MI Michigan US White No 7.000000
42 Martin, Kenyon 35 Knicks F 3 $884,293 81 225 13 2000 12/30/1977 Cincinnati Saginaw, MI Michigan US Black No 6.750000
43 Maxiell, Jason 30 Magic F 54 $1,500,000 79 260 8 2005 2/18/1983 Cincinnati Chicago, IL Illinois US Black No 6.583333
45 Booker, Trevor 25 Wizards F 35 $2,350,820 80 240 3 2010 11/25/1987 Clemson Newberry, SC South Carolina US Black No 6.666667
47 Perkins, Kendrick 28 Thunder C 5 $8,727,437 82 270 10 2003 11/10/1984 Clifton J. Ozen HS (TX) Nederland, TX Texas US Black Yes 6.833333
48 Copeland, Chris 29 Pacers F 22 $300,000 80 225 1 2012 3/17/1984 Colorado Orange, NJ New Jersey US Black No 6.666667
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
462 Barnes, Matt 33 Clippers F 22 $3,250,000 79 226 9 2004 3/9/1980 UCLA Santa Clara, CA California US Mixed No 6.583333
463 Love, Kevin 25 Timberwolves F/C 42 $14,693,906 82 260 5 2008 9/7/1988 UCLA Santa Monica, CA California US White No 6.833333
473 Ariza, Trevor 28 Wizards G/F 1 $7,727,280 80 210 9 2004 6/30/1985 UCLA Miami, FL Florida US Black No 6.666667
474 Anthony, Joel 31 Heat C 50 $3,800,000 81 245 6 2007 8/9/1982 UNLV Montreal, QB Quebec Canada Black No 6.750000
475 Bennett, Anthony 20 Cavaliers F 15 $5,324,280 80 240 0 2013 3/14/1993 UNLV Toronto, ON Ontario Canada Black No 6.666667
476 Amundson, Lou 30 Clippers F 17 $185,955 81 225 6 2007 12/7/1982 UNLV Ventura, CA California US White No 6.750000
477 Marion, Shawn 35 Mavericks F 0 $9,316,796 79 228 14 1999 5/7/1978 UNLV Waukegan, IL Illinois US Black No 6.583333
478 AyÛn, Gustavo 28 Hawks F/C 14 $1,500,000 82 250 2 2011 4/1/1985 UPAEP (MEX) Tepic n/a Mexico Hispanic No 6.833333
479 Bogut, Andrew 28 Warriors C 12 $14,000,000 84 260 8 2005 11/28/1984 Utah Melbourne, VI Victoria Australia White No 7.000000
483 Ezeli, Festus 23 Warriors C 31 $1,066,920 83 255 1 2012 10/21/1989 Vanderbilt Benin City, NI n/a Nigeria Black No 6.916667
484 Taylor, Jeffery 24 Bobcats G/F 44 $788,872 79 225 1 2012 5/23/1989 Vanderbilt Norrkoping, OS Ostergotland Sweden Black No 6.583333
486 Cunningham, Dante 26 Timberwolves F 33 $2,180,000 80 230 4 2009 4/22/1987 Villanova Clinton, MD Maryland US Black No 6.666667
490 Scott, Mike 25 Hawks F 32 $788,872 80 237 1 2012 7/16/1988 Virginia Chesapeake, VA Virginia US Black No 6.666667
492 Sanders, Larry 24 Bucks F/C 8 $3,053,368 83 235 3 2010 11/21/1988 Virginia Commonwealth Fort Pierce, FL Florida US Black No 6.916667
496 Johnson, James 26 Hawks F 13 n/a 81 248 4 2009 2/20/1987 Wake Forest Cheyene, WY Wyoming US Black No 6.750000
497 Aminu, Al-Farouq 23 Pelicans F 0 $3,749,602 81 215 3 2010 9/21/1990 Wake Forest Atlanta, GA Georgia US Black No 6.750000
501 Duncan, Tim 37 Spurs F/C 21 $10,361,446 83 255 16 1997 4/25/1976 Wake Forest Christiansted, VI Virgin Islands Virgin Islands Black No 6.916667
502 Hawes, Spencer 25 76ers C 0 $6,500,000 85 245 6 2007 4/28/1988 Washington Seattle, WA Washington US White No 7.083333
510 Baynes, Aron 26 Spurs F/C 16 $788,872 82 260 0 2013 12/9/1986 Washington State Gisborne, NZ n/a New Zealand White No 6.833333
511 Thompson, Klay 23 Warriors G/F 11 $2,317,920 79 205 2 2011 2/8/1990 Washington State Los Angeles, CA California US Mixed No 6.583333
513 Alexander, Joe 26 Warriors F 25 $854,389 80 230 5 2008 12/26/1986 West Virginia Kaohsiung, TA n/a Taiwan White No 6.666667
514 Fischer, D'or 32 Wizards C 21 n/a 83 255 0 2013 10/12/1981 West Virginia Philadelphia, PA Pennsylvania US Black No 6.916667
515 Ebanks, Devin 23 Mavericks F 37 $884,293 81 215 3 2010 10/28/1989 West Virginia New York City, NY New York US Black No 6.750000
516 Johnson, Amir 26 Raptors F/C 15 $6,500,000 81 210 8 2005 5/1/1987 Westchester HS (CA) Los Angeles, CA California US Black Yes 6.750000
517 Martin, Kevin 30 Timberwolves G 23 $6,500,000 79 185 9 2004 2/1/1983 Western Carolina Zanesville, OH Ohio US Mixed No 6.583333
518 Evans, Jeremy 25 Jazz F 40 $1,660,257 81 194 3 2010 10/24/1987 Western Kentucky Crossett, AR Arkansas US Black No 6.750000
522 Stiemsma, Greg 28 Pelicans C 34 $2,676,000 83 260 2 2011 9/26/1985 Wisconsin Randolph, WI Wisconsin US White No 6.916667
523 Leuer, Jon 24 Grizzlies F 30 $900,000 82 228 2 2011 5/14/1989 Wisconsin Long Lake, MN Minnesota US White No 6.833333
524 Landry, Marcus 27 Lakers F 14 $788,872 79 225 17 1996 11/1/1985 Wisconsin Milwaukee, WI Wisconsin US Black No 6.583333
526 West, David 33 Pacers F 21 $12,000,000 81 250 10 2003 8/29/1980 Xavier Teaneck, NJ New Jersey US Black No 6.750000

317 rows × 18 columns


In [45]:
df['Race'] == 'Asian'


Out[45]:
0      False
1      False
2      False
3      False
4      False
5      False
6      False
7      False
8      False
9      False
10     False
11     False
12     False
13     False
14     False
15     False
16     False
17     False
18     False
19     False
20     False
21     False
22     False
23     False
24     False
25     False
26     False
27     False
28     False
29     False
       ...  
498    False
499    False
500    False
501    False
502    False
503    False
504    False
505    False
506    False
507    False
508    False
509    False
510    False
511    False
512    False
513    False
514    False
515    False
516    False
517    False
518    False
519    False
520    False
521    False
522    False
523    False
524    False
525    False
526    False
527    False
Name: Race, dtype: bool

In [46]:
df[]


  File "<ipython-input-46-722293926dbc>", line 1
    df[]
       ^
SyntaxError: invalid syntax

In [47]:
# Or only the guards
df['POS'] == 'G'.head()
#People below 6 feet
df['feet'] < 6.5


---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-47-bd84b4bf4c1c> in <module>()
      1 # Or only the guards
----> 2 df['POS'] == 'G'.head()
      3 #People below 6 feet
      4 df['feet'] < 6.5

AttributeError: 'str' object has no attribute 'head'

In [48]:
#Every column you ant to query needs parenthesis aroung it
#Guards that are higher than 6.5
#this is combination of both
df[(df['POS'] == 'G') & (df['feet'] < 6.5)].head()


Out[48]:
Name Age Team POS # 2013 $ Ht (In.) WT EXP 1st Year DOB School City State (Province, Territory, Etc..) Country Race HS Only feet
2 Williams, Mo 30 Trail Blazers G 25 $2,652,000 73 195 10 2003 12/19/1982 Alabama Jackson, MS Mississippi US Black No 6.083333
10 Bayless, Jerryd 25 Grizzlies G 7 $3,135,000 75 200 5 2008 8/20/1988 Arizona Phoenix, AZ Arizona US Black No 6.250000
11 Terry, Jason 36 Nets G 31 $5,625,313 74 180 14 1999 9/15/1977 Arizona Seattle, WA Washington US Black No 6.166667
12 Fogg, Kyle 23 Nuggets G 6 n/a 75 183 0 2013 1/27/1990 Arizona Brea, CA California US Black No 6.250000
17 Harden, James 24 Rockets G 13 $13,701,250 77 220 4 2009 8/26/1989 Arizona State Los Angeles, CA California US Black No 6.416667

In [49]:
#We can save stuff
centers = df[df['POS'] == 'C']
guards = df[df['POS'] ==  'G']

In [50]:
centers['feet'].describe()


Out[50]:
count    67.000000
mean      6.962687
std       0.087381
min       6.750000
25%       6.916667
50%       7.000000
75%       7.000000
max       7.250000
Name: feet, dtype: float64

In [51]:
guards['feet'].describe()


Out[51]:
count    175.000000
mean       6.263810
std        0.165729
min        5.750000
25%        6.166667
50%        6.250000
75%        6.416667
max        6.583333
Name: feet, dtype: float64

In [52]:
# It might be easier to break down the booleans into separate variables

In [53]:
# We can save this stuff

In [ ]:


In [54]:
# Maybe we can compare them to taller players?

Drawing pictures

Okay okay enough code and enough stupid numbers. I'm visual. I want graphics. Okay????? Okay.


In [55]:
!pip install matplotlib


Collecting matplotlib
  Using cached matplotlib-1.5.1-cp35-cp35m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Requirement already satisfied (use --upgrade to upgrade): numpy>=1.6 in /Users/barneyjs/.virtualenvs/07_Introduction_to_Pandas.ipynb/lib/python3.5/site-packages (from matplotlib)
Collecting cycler (from matplotlib)
  Using cached cycler-0.10.0-py2.py3-none-any.whl
Collecting pyparsing!=2.0.0,!=2.0.4,>=1.5.6 (from matplotlib)
  Using cached pyparsing-2.1.4-py2.py3-none-any.whl
Requirement already satisfied (use --upgrade to upgrade): pytz in /Users/barneyjs/.virtualenvs/07_Introduction_to_Pandas.ipynb/lib/python3.5/site-packages (from matplotlib)
Requirement already satisfied (use --upgrade to upgrade): python-dateutil in /Users/barneyjs/.virtualenvs/07_Introduction_to_Pandas.ipynb/lib/python3.5/site-packages (from matplotlib)
Requirement already satisfied (use --upgrade to upgrade): six in /Users/barneyjs/.virtualenvs/07_Introduction_to_Pandas.ipynb/lib/python3.5/site-packages (from cycler->matplotlib)
Installing collected packages: cycler, pyparsing, matplotlib
Successfully installed cycler-0.10.0 matplotlib-1.5.1 pyparsing-2.1.4

In [ ]:
# This will scream we don't have matplotlib.


df['feet'].hist()


/Users/barneyjs/.virtualenvs/07_Introduction_to_Pandas.ipynb/lib/python3.5/site-packages/matplotlib/font_manager.py:273: UserWarning: Matplotlib is building the font cache using fc-list. This may take a moment.
  warnings.warn('Matplotlib is building the font cache using fc-list. This may take a moment.')

matplotlib is a graphing library. It's the Python way to make graphs!


In [151]:
%matplotlib inline


---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-151-e27d371d6baa> in <module>()
----> 1 get_ipython().magic('matplotlib inline')

/Users/barneyjs/.virtualenvs/data_analysis/lib/python3.5/site-packages/IPython/core/interactiveshell.py in magic(self, arg_s)
   2161         magic_name, _, magic_arg_s = arg_s.partition(' ')
   2162         magic_name = magic_name.lstrip(prefilter.ESC_MAGIC)
-> 2163         return self.run_line_magic(magic_name, magic_arg_s)
   2164 
   2165     #-------------------------------------------------------------------------

/Users/barneyjs/.virtualenvs/data_analysis/lib/python3.5/site-packages/IPython/core/interactiveshell.py in run_line_magic(self, magic_name, line)
   2082                 kwargs['local_ns'] = sys._getframe(stack_depth).f_locals
   2083             with self.builtin_trap:
-> 2084                 result = fn(*args,**kwargs)
   2085             return result
   2086 

<decorator-gen-106> in matplotlib(self, line)

/Users/barneyjs/.virtualenvs/data_analysis/lib/python3.5/site-packages/IPython/core/magic.py in <lambda>(f, *a, **k)
    191     # but it's overkill for just that one bit of state.
    192     def magic_deco(arg):
--> 193         call = lambda f, *a, **k: f(*a, **k)
    194 
    195         if callable(arg):

/Users/barneyjs/.virtualenvs/data_analysis/lib/python3.5/site-packages/IPython/core/magics/pylab.py in matplotlib(self, line)
     98             print("Available matplotlib backends: %s" % backends_list)
     99         else:
--> 100             gui, backend = self.shell.enable_matplotlib(args.gui)
    101             self._show_matplotlib_backend(args.gui, backend)
    102 

/Users/barneyjs/.virtualenvs/data_analysis/lib/python3.5/site-packages/IPython/core/interactiveshell.py in enable_matplotlib(self, gui)
   2937         """
   2938         from IPython.core import pylabtools as pt
-> 2939         gui, backend = pt.find_gui_and_backend(gui, self.pylab_gui_select)
   2940 
   2941         if gui != 'inline':

/Users/barneyjs/.virtualenvs/data_analysis/lib/python3.5/site-packages/IPython/core/pylabtools.py in find_gui_and_backend(gui, gui_select)
    258     """
    259 
--> 260     import matplotlib
    261 
    262     if gui and gui != 'auto':

/Users/barneyjs/.virtualenvs/data_analysis/lib/python3.5/site-packages/matplotlib/__init__.py in <module>()
   1129 
   1130 # this is the instance used by the matplotlib classes
-> 1131 rcParams = rc_params()
   1132 
   1133 if rcParams['examples.directory']:

/Users/barneyjs/.virtualenvs/data_analysis/lib/python3.5/site-packages/matplotlib/__init__.py in rc_params(fail_on_error)
    973         return ret
    974 
--> 975     return rc_params_from_file(fname, fail_on_error)
    976 
    977 

/Users/barneyjs/.virtualenvs/data_analysis/lib/python3.5/site-packages/matplotlib/__init__.py in rc_params_from_file(fname, fail_on_error, use_default_template)
   1098         parameters specified in the file. (Useful for updating dicts.)
   1099     """
-> 1100     config_from_file = _rc_params_in_file(fname, fail_on_error)
   1101 
   1102     if not use_default_template:

/Users/barneyjs/.virtualenvs/data_analysis/lib/python3.5/site-packages/matplotlib/__init__.py in _rc_params_in_file(fname, fail_on_error)
   1016     cnt = 0
   1017     rc_temp = {}
-> 1018     with _open_file_or_url(fname) as fd:
   1019         try:
   1020             for line in fd:

/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/contextlib.py in __enter__(self)
     57     def __enter__(self):
     58         try:
---> 59             return next(self.gen)
     60         except StopIteration:
     61             raise RuntimeError("generator didn't yield") from None

/Users/barneyjs/.virtualenvs/data_analysis/lib/python3.5/site-packages/matplotlib/__init__.py in _open_file_or_url(fname)
    998     else:
    999         fname = os.path.expanduser(fname)
-> 1000         encoding = locale.getdefaultlocale()[1]
   1001         if encoding is None:
   1002             encoding = "utf-8"

/Users/barneyjs/.virtualenvs/data_analysis/lib/python3.5/locale.py in getdefaultlocale(envvars)
    557     else:
    558         localename = 'C'
--> 559     return _parse_localename(localename)
    560 
    561 

/Users/barneyjs/.virtualenvs/data_analysis/lib/python3.5/locale.py in _parse_localename(localename)
    485     elif code == 'C':
    486         return None, None
--> 487     raise ValueError('unknown locale: %s' % localename)
    488 
    489 def _build_localename(localetuple):

ValueError: unknown locale: UTF-8

In [ ]:
df['feet'].hist()

In [ ]:
# this will open up a weird window that won't do anything
import matplot.

In [ ]:
# So instead you run this code

In [ ]:
plt.style.use('fivethirtyeight')
df['feet'].hist()

But that's ugly. There's a thing called ggplot for R that looks nice. We want to look nice. We want to look like ggplot.


In [ ]:
# Import matplotlib
# What's available?

In [ ]:
# Use ggplot

In [ ]:
# Make a histogram

In [ ]:
# Try some other styles

In [ ]:

That might look better with a little more customization. So let's customize it.


In [ ]:
# Pass in all sorts of stuff!
# Most from http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.hist.html
# .range() is a matplotlib thing

I want more graphics! Do tall people make more money?!?!


In [ ]:


In [ ]:


In [ ]:
# How does experience relate with the amount of money they're making?

In [ ]:
# At least we can assume height and weight are related

In [ ]:
# At least we can assume height and weight are related
# http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.plot.html

In [ ]:


In [ ]:


In [ ]:
# We can also use plt separately
# It's SIMILAR but TOTALLY DIFFERENT

In [ ]: