pandasPandas! They are adorable animals. You might think they are the worst animal ever but that is not true. You might sometimes think pandas is the worst library every, and that is only kind of true.
The important thing is use the right tool for the job. pandas is good for some stuff, SQL is good for some stuff, writing raw Python is good for some stuff. You'll figure it out as you go along.
Now let's start coding. Hopefully you did pip install pandas before you started up this notebook.
In [1]:
# import pandas, but call it pd. Why? Because that's What People Do.
import pandas as pd
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-1-a9337049a1ca> in <module>()
1 # import pandas, but call it pd. Why? Because that's What People Do.
----> 2 import pandas as pd
/Users/barneyjs/.virtualenvs/data_analysis/lib/python3.5/site-packages/pandas/__init__.py in <module>()
37 import pandas.core.config_init
38
---> 39 from pandas.core.api import *
40 from pandas.sparse.api import *
41 from pandas.stats.api import *
/Users/barneyjs/.virtualenvs/data_analysis/lib/python3.5/site-packages/pandas/core/api.py in <module>()
8 from pandas.core.common import isnull, notnull
9 from pandas.core.categorical import Categorical
---> 10 from pandas.core.groupby import Grouper
11 from pandas.formats.format import set_eng_float_format
12 from pandas.core.index import (Index, CategoricalIndex, Int64Index,
/Users/barneyjs/.virtualenvs/data_analysis/lib/python3.5/site-packages/pandas/core/groupby.py in <module>()
16 DataError, SpecificationError)
17 from pandas.core.categorical import Categorical
---> 18 from pandas.core.frame import DataFrame
19 from pandas.core.generic import NDFrame
20 from pandas.core.index import (Index, MultiIndex, CategoricalIndex,
/Users/barneyjs/.virtualenvs/data_analysis/lib/python3.5/site-packages/pandas/core/frame.py in <module>()
37 create_block_manager_from_arrays,
38 create_block_manager_from_blocks)
---> 39 from pandas.core.series import Series
40 from pandas.core.categorical import Categorical
41 import pandas.computation.expressions as expressions
/Users/barneyjs/.virtualenvs/data_analysis/lib/python3.5/site-packages/pandas/core/series.py in <module>()
2942 # Add plotting methods to Series
2943
-> 2944 import pandas.tools.plotting as _gfx # noqa
2945
2946 Series.plot = base.AccessorProperty(_gfx.SeriesPlotMethods,
/Users/barneyjs/.virtualenvs/data_analysis/lib/python3.5/site-packages/pandas/tools/plotting.py in <module>()
25 from pandas.util.decorators import Appender
26 try: # mpl optional
---> 27 import pandas.tseries.converter as conv
28 conv.register() # needs to override so set_xlim works with str/number
29 except ImportError:
/Users/barneyjs/.virtualenvs/data_analysis/lib/python3.5/site-packages/pandas/tseries/converter.py in <module>()
5 from dateutil.relativedelta import relativedelta
6
----> 7 import matplotlib.units as units
8 import matplotlib.dates as dates
9
/Users/barneyjs/.virtualenvs/data_analysis/lib/python3.5/site-packages/matplotlib/__init__.py in <module>()
1129
1130 # this is the instance used by the matplotlib classes
-> 1131 rcParams = rc_params()
1132
1133 if rcParams['examples.directory']:
/Users/barneyjs/.virtualenvs/data_analysis/lib/python3.5/site-packages/matplotlib/__init__.py in rc_params(fail_on_error)
973 return ret
974
--> 975 return rc_params_from_file(fname, fail_on_error)
976
977
/Users/barneyjs/.virtualenvs/data_analysis/lib/python3.5/site-packages/matplotlib/__init__.py in rc_params_from_file(fname, fail_on_error, use_default_template)
1098 parameters specified in the file. (Useful for updating dicts.)
1099 """
-> 1100 config_from_file = _rc_params_in_file(fname, fail_on_error)
1101
1102 if not use_default_template:
/Users/barneyjs/.virtualenvs/data_analysis/lib/python3.5/site-packages/matplotlib/__init__.py in _rc_params_in_file(fname, fail_on_error)
1016 cnt = 0
1017 rc_temp = {}
-> 1018 with _open_file_or_url(fname) as fd:
1019 try:
1020 for line in fd:
/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/contextlib.py in __enter__(self)
57 def __enter__(self):
58 try:
---> 59 return next(self.gen)
60 except StopIteration:
61 raise RuntimeError("generator didn't yield") from None
/Users/barneyjs/.virtualenvs/data_analysis/lib/python3.5/site-packages/matplotlib/__init__.py in _open_file_or_url(fname)
998 else:
999 fname = os.path.expanduser(fname)
-> 1000 encoding = locale.getdefaultlocale()[1]
1001 if encoding is None:
1002 encoding = "utf-8"
/Users/barneyjs/.virtualenvs/data_analysis/lib/python3.5/locale.py in getdefaultlocale(envvars)
557 else:
558 localename = 'C'
--> 559 return _parse_localename(localename)
560
561
/Users/barneyjs/.virtualenvs/data_analysis/lib/python3.5/locale.py in _parse_localename(localename)
485 elif code == 'C':
486 return None, None
--> 487 raise ValueError('unknown locale: %s' % localename)
488
489 def _build_localename(localetuple):
ValueError: unknown locale: UTF-8
When you import pandas, you use import pandas as pd. That means instead of typing pandas in your code you'll type pd.
You don't have to, but every other person on the planet will be doing it, so you might as well.
Now we're going to read in a file. Our file is called NBA-Census-10.14.2013.csv because we're sports moguls. pandas can read_ different types of files, so try to figure it out by typing pd.read_ and hitting tab for autocomplete.
In [9]:
# We're going to call this df, which means "data frame"
# It isn't in UTF-8 (I saved it from my mac!) so we need to set the encoding
df = pd.read_csv("NBA-Census-10.14.2013.csv", encoding='mac_roman')
A dataframe is basically a spreadsheet, except it lives in the world of Python or the statistical programming language R. They can't call it a spreadsheet because then people would think those programmers used Excel, which would make them boring and normal and they'd have to wear a tie every day.
Now let's look at our data, since that's what data is for
In [15]:
# Let's look at all of it
df
Out[15]:
Name
Age
Team
POS
#
2013 $
Ht (In.)
WT
EXP
1st Year
DOB
School
City
State (Province, Territory, Etc..)
Country
Race
HS Only
0
Gee, Alonzo
26
Cavaliers
F
33
$3,250,000
78
219
4
2009
5/29/1987
Alabama
Riviera Beach, FL
Florida
US
Black
No
1
Wallace, Gerald
31
Celtics
F
45
$10,105,855
79
220
12
2001
7/23/1982
Alabama
Sylacauga, AL
Alabama
US
Black
No
2
Williams, Mo
30
Trail Blazers
G
25
$2,652,000
73
195
10
2003
12/19/1982
Alabama
Jackson, MS
Mississippi
US
Black
No
3
Gladness, Mickell
27
Magic
C
40
$762,195
83
220
2
2011
7/26/1986
Alabama A&M
Birmingham, AL
Alabama
US
Black
No
4
Jefferson, Richard
33
Jazz
F
44
$11,046,000
79
230
12
2001
6/21/1980
Arizona
Los Angeles, CA
California
US
Black
No
5
Hill, Solomon
22
Pacers
F
9
$1,246,680
79
220
0
2013
3/18/1991
Arizona
Los Angeles, CA
California
US
Black
No
6
Budinger, Chase
25
Timberwolves
F
10
$5,000,000
79
218
4
2009
5/22/1988
Arizona
Encinitas, CA
California
US
White
No
7
Williams, Derrick
22
Timberwolves
F
7
$5,016,960
80
241
2
2011
5/25/1991
Arizona
La Mirada, CA
California
US
Black
No
8
Hill, Jordan
26
Lakers
F/C
27
$3,563,600
82
235
1
2012
7/27/1987
Arizona
Newberry, SC
South Carolina
US
Black
No
9
Frye, Channing
30
Suns
F/C
8
$6,500,000
83
245
8
2005
5/17/1983
Arizona
White Plains, NY
New York
US
Black
No
10
Bayless, Jerryd
25
Grizzlies
G
7
$3,135,000
75
200
5
2008
8/20/1988
Arizona
Phoenix, AZ
Arizona
US
Black
No
11
Terry, Jason
36
Nets
G
31
$5,625,313
74
180
14
1999
9/15/1977
Arizona
Seattle, WA
Washington
US
Black
No
12
Fogg, Kyle
23
Nuggets
G
6
n/a
75
183
0
2013
1/27/1990
Arizona
Brea, CA
California
US
Black
No
13
Iguodala, Andre
29
Warriors
G/F
9
$12,868,632
78
207
9
2004
1/28/1984
Arizona
Springfield, IL
Illinois
US
Black
No
14
Boateng, Eric
27
Lakers
C
12
n/a
82
257
17
1996
11/20/1985
Arizona State
London, ENG
n/a
England
Black
No
15
Diogu, Ike
29
Knicks
F/C
50
$792,377
80
255
8
2005
11/9/1983
Arizona State
Buffalo, NY
New York
US
Black
No
16
Ayres, Jeff
26
Spurs
F/C
11
$1,750,000
81
250
4
2009
4/29/1987
Arizona State
Ontario, CA
California
US
Black
No
17
Harden, James
24
Rockets
G
13
$13,701,250
77
220
4
2009
8/26/1989
Arizona State
Los Angeles, CA
California
US
Black
No
18
Felix, Carrick
23
Cavaliers
G/F
30
$510,000
78
210
0
2013
8/17/1990
Arizona State
Goodyear, AZ
Arizona
US
Black
No
19
Pargo, Jannero
33
Bobcats
G
5
$884,293
73
185
11
2002
10/22/1979
Arkansas
Chicago, IL
Illinois
US
Black
No
20
Beverley, Patrick
25
Rockets
G
2
$788,872
73
185
5
2008
7/12/1988
Arkansas
Chicago, IL
Illinois
US
Black
No
21
Johnson, Joe
32
Nets
G/F
7
$21,466,718
79
240
12
2001
6/29/1981
Arkansas
Little Rock, AR
Arkansas
US
Black
No
22
Brewer, Ronnie
28
Rockets
G/F
10
$1,186,459
79
235
7
2006
3/20/1985
Arkansas
Portland, OR
Oregon
US
Black
No
23
Fisher, Derek
39
Thunder
G
6
$884,293
73
210
17
1996
8/9/1974
Arkansas-Little Rock
Little Rock, AR
Arkansas
US
Black
No
24
Miller, Quincy
20
Nuggets
F
30
$788,872
81
210
1
2012
11/18/1992
Baylor
North Carolina, IL
Illinois
US
Black
No
25
Acy, Quincy
23
Raptors
F
4
$788,872
79
225
1
2012
10/6/1990
Baylor
Tyler, TX
Texas
US
Black
No
26
Jones, Perry
22
Thunder
F
3
$1,082,520
83
235
1
2012
9/24/1991
Baylor
Winnsboro, LA
Louisiana
US
Black
No
27
Udoh, Ekpe
26
Bucks
F/C
5
$4,469,548
82
245
3
2010
5/20/1987
Baylor
Edmond, OK
Oklahoma
US
Black
No
28
Clark, Ian
22
Jazz
G
21
$490,180
75
175
0
2013
3/7/1991
Belmont
Memphis, TN
Tennessee
US
Black
No
29
Andersen, Chris
35
Heat
F/C
11
$1,399,507
82
228
12
2001
7/7/1978
Blinn College
Long Beach, CA
California
US
White
No
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
498
Paul, Chris
28
Clippers
G
3
$18,668,431
72
175
8
2005
5/6/1985
Wake Forest
Forsyth County, NC
North Carolina
US
Black
No
499
Teague, Jeff
25
Hawks
G
0
$8,000,000
74
181
4
2009
6/10/1988
Wake Forest
Indianapolis, IN
Indiana
US
Black
No
500
Smith, Ish
25
Suns
G
30
$951,463
72
175
3
2010
7/5/1988
Wake Forest
Charlotte, NC
North Carolina
US
Black
No
501
Duncan, Tim
37
Spurs
F/C
21
$10,361,446
83
255
16
1997
4/25/1976
Wake Forest
Christiansted, VI
Virgin Islands
Virgin Islands
Black
No
502
Hawes, Spencer
25
76ers
C
0
$6,500,000
85
245
6
2007
4/28/1988
Washington
Seattle, WA
Washington
US
White
No
503
Wroten, Tony
20
76ers
G
8
$1,160,040
78
205
1
2012
4/13/1993
Washington
Renton, WA
Washington
US
Black
No
504
Gaddy, Abdul
21
Bobcats
G
10
n/a
75
185
0
2013
1/26/1992
Washington
Tacoma, WA
Washington
US
Black
No
505
Thomas, Isaiah
24
Kings
G
22
$884,293
69
185
2
2011
2/7/1989
Washington
Tacoma, WA
Washington
US
Black
No
506
Robinson, Nate
29
Nuggets
G
10
$2,016,000
69
180
8
2005
5/31/1984
Washington
Seattle, WA
Washington
US
Black
No
507
Ross, Terrence
22
Raptors
G
31
$2,678,640
78
195
1
2012
2/5/1991
Washington
Portland, OR
Oregon
US
Black
No
508
Pondexter, Quincy
25
Grizzlies
G/F
20
$225,479
78
225
3
2010
3/10/1988
Washington
Fresno, CA
California
US
Black
No
509
Holiday, Justin
24
Jazz
G/F
22
$788,872
78
185
0
2013
4/5/1989
Washington
Mission Hills, CA
California
US
Black
No
510
Baynes, Aron
26
Spurs
F/C
16
$788,872
82
260
0
2013
12/9/1986
Washington State
Gisborne, NZ
n/a
New Zealand
White
No
511
Thompson, Klay
23
Warriors
G/F
11
$2,317,920
79
205
2
2011
2/8/1990
Washington State
Los Angeles, CA
California
US
Mixed
No
512
Lillard, Damian
23
Trail Blazers
G
0
$3,202,920
75
195
1
2012
7/15/1990
Weber State
Oakland, CA
California
US
Black
No
513
Alexander, Joe
26
Warriors
F
25
$854,389
80
230
5
2008
12/26/1986
West Virginia
Kaohsiung, TA
n/a
Taiwan
White
No
514
Fischer, D'or
32
Wizards
C
21
n/a
83
255
0
2013
10/12/1981
West Virginia
Philadelphia, PA
Pennsylvania
US
Black
No
515
Ebanks, Devin
23
Mavericks
F
37
$884,293
81
215
3
2010
10/28/1989
West Virginia
New York City, NY
New York
US
Black
No
516
Johnson, Amir
26
Raptors
F/C
15
$6,500,000
81
210
8
2005
5/1/1987
Westchester HS (CA)
Los Angeles, CA
California
US
Black
Yes
517
Martin, Kevin
30
Timberwolves
G
23
$6,500,000
79
185
9
2004
2/1/1983
Western Carolina
Zanesville, OH
Ohio
US
Mixed
No
518
Evans, Jeremy
25
Jazz
F
40
$1,660,257
81
194
3
2010
10/24/1987
Western Kentucky
Crossett, AR
Arkansas
US
Black
No
519
Lee, Courtney
28
Celtics
G/F
11
$5,225,000
77
200
5
2008
10/3/1985
Western Kentucky
Indianapolis, IN
Indiana
US
Black
No
520
Mekel, Gal
25
Mavericks
G
33
$490,180
75
191
5
2008
3/4/1988
Wichita State
Petah Tikva
n/a
Israel
White
No
521
Murry, Toure'
23
Knicks
G/F
23
$490,180
77
195
0
2013
11/8/1989
Wichita State
Houston, TX
Texas
US
Black
No
522
Stiemsma, Greg
28
Pelicans
C
34
$2,676,000
83
260
2
2011
9/26/1985
Wisconsin
Randolph, WI
Wisconsin
US
White
No
523
Leuer, Jon
24
Grizzlies
F
30
$900,000
82
228
2
2011
5/14/1989
Wisconsin
Long Lake, MN
Minnesota
US
White
No
524
Landry, Marcus
27
Lakers
F
14
$788,872
79
225
17
1996
11/1/1985
Wisconsin
Milwaukee, WI
Wisconsin
US
Black
No
525
Harris, Devin
30
Mavericks
G
20
$854,389
75
192
9
2004
2/27/1983
Wisconsin
Milwaukee, WI
Wisconsin
US
Black
No
526
West, David
33
Pacers
F
21
$12,000,000
81
250
10
2003
8/29/1980
Xavier
Teaneck, NJ
New Jersey
US
Black
No
527
Crawford, Jordan
24
Celtics
G
27
$2,162,419
76
195
3
2010
10/23/1988
Xavier
Detroit, MI
Michigan
US
Black
No
528 rows × 17 columns
If we scroll we can see all of it. But maybe we don't want to see all of it. Maybe we hate scrolling?
In [14]:
# Look at the first few rows
df.head()
Out[14]:
Name
Age
Team
POS
#
2013 $
Ht (In.)
WT
EXP
1st Year
DOB
School
City
State (Province, Territory, Etc..)
Country
Race
HS Only
0
Gee, Alonzo
26
Cavaliers
F
33
$3,250,000
78
219
4
2009
5/29/1987
Alabama
Riviera Beach, FL
Florida
US
Black
No
1
Wallace, Gerald
31
Celtics
F
45
$10,105,855
79
220
12
2001
7/23/1982
Alabama
Sylacauga, AL
Alabama
US
Black
No
2
Williams, Mo
30
Trail Blazers
G
25
$2,652,000
73
195
10
2003
12/19/1982
Alabama
Jackson, MS
Mississippi
US
Black
No
3
Gladness, Mickell
27
Magic
C
40
$762,195
83
220
2
2011
7/26/1986
Alabama A&M
Birmingham, AL
Alabama
US
Black
No
4
Jefferson, Richard
33
Jazz
F
44
$11,046,000
79
230
12
2001
6/21/1980
Arizona
Los Angeles, CA
California
US
Black
No
...but maybe we want to see more than a measly five results?
In [16]:
# Let's look at MORE of the first few rows
df.head(10)
Out[16]:
Name
Age
Team
POS
#
2013 $
Ht (In.)
WT
EXP
1st Year
DOB
School
City
State (Province, Territory, Etc..)
Country
Race
HS Only
0
Gee, Alonzo
26
Cavaliers
F
33
$3,250,000
78
219
4
2009
5/29/1987
Alabama
Riviera Beach, FL
Florida
US
Black
No
1
Wallace, Gerald
31
Celtics
F
45
$10,105,855
79
220
12
2001
7/23/1982
Alabama
Sylacauga, AL
Alabama
US
Black
No
2
Williams, Mo
30
Trail Blazers
G
25
$2,652,000
73
195
10
2003
12/19/1982
Alabama
Jackson, MS
Mississippi
US
Black
No
3
Gladness, Mickell
27
Magic
C
40
$762,195
83
220
2
2011
7/26/1986
Alabama A&M
Birmingham, AL
Alabama
US
Black
No
4
Jefferson, Richard
33
Jazz
F
44
$11,046,000
79
230
12
2001
6/21/1980
Arizona
Los Angeles, CA
California
US
Black
No
5
Hill, Solomon
22
Pacers
F
9
$1,246,680
79
220
0
2013
3/18/1991
Arizona
Los Angeles, CA
California
US
Black
No
6
Budinger, Chase
25
Timberwolves
F
10
$5,000,000
79
218
4
2009
5/22/1988
Arizona
Encinitas, CA
California
US
White
No
7
Williams, Derrick
22
Timberwolves
F
7
$5,016,960
80
241
2
2011
5/25/1991
Arizona
La Mirada, CA
California
US
Black
No
8
Hill, Jordan
26
Lakers
F/C
27
$3,563,600
82
235
1
2012
7/27/1987
Arizona
Newberry, SC
South Carolina
US
Black
No
9
Frye, Channing
30
Suns
F/C
8
$6,500,000
83
245
8
2005
5/17/1983
Arizona
White Plains, NY
New York
US
Black
No
But maybe we want to make a basketball joke and see the final four?
In [17]:
# Let's look at the final few rows
df.tail(4)
Out[17]:
Name
Age
Team
POS
#
2013 $
Ht (In.)
WT
EXP
1st Year
DOB
School
City
State (Province, Territory, Etc..)
Country
Race
HS Only
524
Landry, Marcus
27
Lakers
F
14
$788,872
79
225
17
1996
11/1/1985
Wisconsin
Milwaukee, WI
Wisconsin
US
Black
No
525
Harris, Devin
30
Mavericks
G
20
$854,389
75
192
9
2004
2/27/1983
Wisconsin
Milwaukee, WI
Wisconsin
US
Black
No
526
West, David
33
Pacers
F
21
$12,000,000
81
250
10
2003
8/29/1980
Xavier
Teaneck, NJ
New Jersey
US
Black
No
527
Crawford, Jordan
24
Celtics
G
27
$2,162,419
76
195
3
2010
10/23/1988
Xavier
Detroit, MI
Michigan
US
Black
No
So yes, head and tail work kind of like the terminal commands. That's nice, I guess.
But maybe we're incredibly demanding (which we are) and we want, say, the 6th through the 8th row (which we do). Don't worry (which I know you were), we can do that, too.
In [24]:
# Show the 6th through the 8th rows
df[5:8]
Out[24]:
Name
Age
Team
POS
#
2013 $
Ht (In.)
WT
EXP
1st Year
DOB
School
City
State (Province, Territory, Etc..)
Country
Race
HS Only
5
Hill, Solomon
22
Pacers
F
9
$1,246,680
79
220
0
2013
3/18/1991
Arizona
Los Angeles, CA
California
US
Black
No
6
Budinger, Chase
25
Timberwolves
F
10
$5,000,000
79
218
4
2009
5/22/1988
Arizona
Encinitas, CA
California
US
White
No
7
Williams, Derrick
22
Timberwolves
F
7
$5,016,960
80
241
2
2011
5/25/1991
Arizona
La Mirada, CA
California
US
Black
No
In [43]:
# Get the names of the columns, just because
df.columns
Out[43]:
Index(['Name', 'Age', 'Team', 'POS', '#', '2013 $', 'Ht (In.)', 'WT', 'EXP',
'1st Year', 'DOB', 'School', 'City',
'State (Province, Territory, Etc..)', 'Country', 'Race', 'HS Only'],
dtype='object')
In [44]:
# If we want to be "correct" we add .values on the end of it
df.columns.values
Out[44]:
array(['Name', 'Age', 'Team', 'POS', '#', '2013 $', 'Ht (In.)', 'WT',
'EXP', '1st Year', 'DOB', 'School', 'City',
'State (Province, Territory, Etc..)', 'Country', 'Race', 'HS Only'], dtype=object)
In [28]:
# Select only name and age
columns_to_show = ['Name', 'Age']
df[columns_to_show]
Out[28]:
Name
Age
0
Gee, Alonzo
26
1
Wallace, Gerald
31
2
Williams, Mo
30
3
Gladness, Mickell
27
4
Jefferson, Richard
33
5
Hill, Solomon
22
6
Budinger, Chase
25
7
Williams, Derrick
22
8
Hill, Jordan
26
9
Frye, Channing
30
10
Bayless, Jerryd
25
11
Terry, Jason
36
12
Fogg, Kyle
23
13
Iguodala, Andre
29
14
Boateng, Eric
27
15
Diogu, Ike
29
16
Ayres, Jeff
26
17
Harden, James
24
18
Felix, Carrick
23
19
Pargo, Jannero
33
20
Beverley, Patrick
25
21
Johnson, Joe
32
22
Brewer, Ronnie
28
23
Fisher, Derek
39
24
Miller, Quincy
20
25
Acy, Quincy
23
26
Jones, Perry
22
27
Udoh, Ekpe
26
28
Clark, Ian
22
29
Andersen, Chris
35
...
...
...
498
Paul, Chris
28
499
Teague, Jeff
25
500
Smith, Ish
25
501
Duncan, Tim
37
502
Hawes, Spencer
25
503
Wroten, Tony
20
504
Gaddy, Abdul
21
505
Thomas, Isaiah
24
506
Robinson, Nate
29
507
Ross, Terrence
22
508
Pondexter, Quincy
25
509
Holiday, Justin
24
510
Baynes, Aron
26
511
Thompson, Klay
23
512
Lillard, Damian
23
513
Alexander, Joe
26
514
Fischer, D'or
32
515
Ebanks, Devin
23
516
Johnson, Amir
26
517
Martin, Kevin
30
518
Evans, Jeremy
25
519
Lee, Courtney
28
520
Mekel, Gal
25
521
Murry, Toure'
23
522
Stiemsma, Greg
28
523
Leuer, Jon
24
524
Landry, Marcus
27
525
Harris, Devin
30
526
West, David
33
527
Crawford, Jordan
24
528 rows × 2 columns
In [29]:
# Combing that with .head() to see not-so-many rows
columns_to_show = ['Name', 'Age']
df[columns_to_show].head()
Out[29]:
Name
Age
0
Gee, Alonzo
26
1
Wallace, Gerald
31
2
Williams, Mo
30
3
Gladness, Mickell
27
4
Jefferson, Richard
33
In [30]:
# We can also do this all in one line, even though it starts looking ugly
# (unlike the cute bears pandas looks ugly pretty often)
df[['Name', 'Age']].head()
Out[30]:
Name
Age
0
Gee, Alonzo
26
1
Wallace, Gerald
31
2
Williams, Mo
30
3
Gladness, Mickell
27
4
Jefferson, Richard
33
NOTE: That was not df['Name', 'Age'], it was df[['Name', 'Age]]. You'll definitely type it wrong all of the time. When things break with pandas it's probably because you forgot to put in a million brackets.
In [31]:
df.head()
Out[31]:
Name
Age
Team
POS
#
2013 $
Ht (In.)
WT
EXP
1st Year
DOB
School
City
State (Province, Territory, Etc..)
Country
Race
HS Only
0
Gee, Alonzo
26
Cavaliers
F
33
$3,250,000
78
219
4
2009
5/29/1987
Alabama
Riviera Beach, FL
Florida
US
Black
No
1
Wallace, Gerald
31
Celtics
F
45
$10,105,855
79
220
12
2001
7/23/1982
Alabama
Sylacauga, AL
Alabama
US
Black
No
2
Williams, Mo
30
Trail Blazers
G
25
$2,652,000
73
195
10
2003
12/19/1982
Alabama
Jackson, MS
Mississippi
US
Black
No
3
Gladness, Mickell
27
Magic
C
40
$762,195
83
220
2
2011
7/26/1986
Alabama A&M
Birmingham, AL
Alabama
US
Black
No
4
Jefferson, Richard
33
Jazz
F
44
$11,046,000
79
230
12
2001
6/21/1980
Arizona
Los Angeles, CA
California
US
Black
No
I want to know how many people are in each position. Luckily, pandas can tell me!
In [36]:
# Grab the POS column, and count the different values in it.
df['POS'].value_counts()
Out[36]:
G 175
F 142
F/C 74
G/F 70
C 67
Name: POS, dtype: int64
Now that was a little weird, yes - we used df['POS'] instead of df[['POS']] when viewing the data's details.
But now I'm curious about numbers: how old is everyone? Maybe we could, I don't know, get some statistics about age? Some statistics to describe age?
In [37]:
# Summary statistics for Age
df['Age'].describe()
Out[37]:
count 528.000000
mean 26.242424
std 4.178868
min 18.000000
25% 23.000000
50% 25.000000
75% 29.000000
max 39.000000
Name: Age, dtype: float64
In [38]:
# That's pretty good. Does it work for everything? How about the money?
df['2013 $'].describe()
Out[38]:
count 528
unique 308
top n/a
freq 43
Name: 2013 $, dtype: object
Unfortunately because that has dollar signs and commas it's thought of as a string. We'll fix it in a second, but let's try describing one more thing.
In [40]:
# Doing more describing
df['Ht (In.)'].describe()
Out[40]:
count 528.000000
mean 79.119318
std 3.431488
min 69.000000
25% 77.000000
50% 80.000000
75% 82.000000
max 87.000000
Name: Ht (In.), dtype: float64
In [47]:
# Take another look at our inches, but only the first few
df['Ht (In.)'].head()
Out[47]:
0 78
1 79
2 73
3 83
4 79
Name: Ht (In.), dtype: int64
In [48]:
# Divide those inches by 12
df['Ht (In.)'].head() / 12
Out[48]:
0 6.500000
1 6.583333
2 6.083333
3 6.916667
4 6.583333
Name: Ht (In.), dtype: float64
In [49]:
# Let's divide ALL of them by 12
feet = df['Ht (In.)'] / 12
feet
Out[49]:
0 6.500000
1 6.583333
2 6.083333
3 6.916667
4 6.583333
5 6.583333
6 6.583333
7 6.666667
8 6.833333
9 6.916667
10 6.250000
11 6.166667
12 6.250000
13 6.500000
14 6.833333
15 6.666667
16 6.750000
17 6.416667
18 6.500000
19 6.083333
20 6.083333
21 6.583333
22 6.583333
23 6.083333
24 6.750000
25 6.583333
26 6.916667
27 6.833333
28 6.250000
29 6.833333
...
498 6.000000
499 6.166667
500 6.000000
501 6.916667
502 7.083333
503 6.500000
504 6.250000
505 5.750000
506 5.750000
507 6.500000
508 6.500000
509 6.500000
510 6.833333
511 6.583333
512 6.250000
513 6.666667
514 6.916667
515 6.750000
516 6.750000
517 6.583333
518 6.750000
519 6.416667
520 6.250000
521 6.416667
522 6.916667
523 6.833333
524 6.583333
525 6.250000
526 6.750000
527 6.333333
Name: Ht (In.), dtype: float64
In [50]:
# Can we get statistics on those?
feet.describe()
Out[50]:
count 528.000000
mean 6.593277
std 0.285957
min 5.750000
25% 6.416667
50% 6.666667
75% 6.833333
max 7.250000
Name: Ht (In.), dtype: float64
In [51]:
# Let's look at our original data again
df.head(2)
Out[51]:
Name
Age
Team
POS
#
2013 $
Ht (In.)
WT
EXP
1st Year
DOB
School
City
State (Province, Territory, Etc..)
Country
Race
HS Only
0
Gee, Alonzo
26
Cavaliers
F
33
$3,250,000
78
219
4
2009
5/29/1987
Alabama
Riviera Beach, FL
Florida
US
Black
No
1
Wallace, Gerald
31
Celtics
F
45
$10,105,855
79
220
12
2001
7/23/1982
Alabama
Sylacauga, AL
Alabama
US
Black
No
Okay that was nice but unfortunately we can't do anything with it. It's just sitting there, separate from our data. If this were normal code we could do blahblah['feet'] = blahblah['Ht (In.)'] / 12, but since this is pandas, we can't. Right? Right?
In [52]:
# Store a new column
df['feet'] = df['Ht (In.)'] / 12
df.head()
Out[52]:
Name
Age
Team
POS
#
2013 $
Ht (In.)
WT
EXP
1st Year
DOB
School
City
State (Province, Territory, Etc..)
Country
Race
HS Only
feet
0
Gee, Alonzo
26
Cavaliers
F
33
$3,250,000
78
219
4
2009
5/29/1987
Alabama
Riviera Beach, FL
Florida
US
Black
No
6.500000
1
Wallace, Gerald
31
Celtics
F
45
$10,105,855
79
220
12
2001
7/23/1982
Alabama
Sylacauga, AL
Alabama
US
Black
No
6.583333
2
Williams, Mo
30
Trail Blazers
G
25
$2,652,000
73
195
10
2003
12/19/1982
Alabama
Jackson, MS
Mississippi
US
Black
No
6.083333
3
Gladness, Mickell
27
Magic
C
40
$762,195
83
220
2
2011
7/26/1986
Alabama A&M
Birmingham, AL
Alabama
US
Black
No
6.916667
4
Jefferson, Richard
33
Jazz
F
44
$11,046,000
79
230
12
2001
6/21/1980
Arizona
Los Angeles, CA
California
US
Black
No
6.583333
That's cool, maybe we could do the same thing with their salary? Take out the $ and the , and convert it to an integer?
In [57]:
# Can't just use .replace
df['2013 $'].head().replace("$","")
Out[57]:
0 $3,250,000
1 $10,105,855
2 $2,652,000
3 $762,195
4 $11,046,000
Name: 2013 $, dtype: object
In [58]:
# Need to use this weird .str thing
df['2013 $'].head().str.replace("$","")
Out[58]:
0 3,250,000
1 10,105,855
2 2,652,000
3 762,195
4 11,046,000
Name: 2013 $, dtype: object
In [59]:
# Can't just immediately replace the , either
df['2013 $'].head().str.replace("$","").replace(",","")
Out[59]:
0 3,250,000
1 10,105,855
2 2,652,000
3 762,195
4 11,046,000
Name: 2013 $, dtype: object
In [64]:
# Need to use the .str thing before EVERY string method
df['2013 $'].head().str.replace("$","").str.replace(",","")
Out[64]:
0 3250000
1 10105855
2 2652000
3 762195
4 11046000
Name: 2013 $, dtype: object
In [66]:
# Describe still doesn't work.
df['2013 $'].head().str.replace("$","").str.replace(",","").describe()
Out[66]:
count 5
unique 5
top 2652000
freq 1
Name: 2013 $, dtype: object
In [67]:
# Let's convert it to an integer using .astype(int) before we describe it
df['2013 $'].head().str.replace("$","").str.replace(",","").astype(int).describe()
Out[67]:
count 5.000000e+00
mean 5.563210e+06
std 4.679007e+06
min 7.621950e+05
25% 2.652000e+06
50% 3.250000e+06
75% 1.010586e+07
max 1.104600e+07
Name: 2013 $, dtype: float64
In [68]:
df['2013 $'].head().str.replace("$","").str.replace(",","").astype(int)
Out[68]:
0 3250000
1 10105855
2 2652000
3 762195
4 11046000
Name: 2013 $, dtype: int64
In [73]:
# Maybe we can just make them millions?
df['2013 $'].head().str.replace("$","").str.replace(",","").astype(int) / 1000000
Out[73]:
0 3.250000
1 10.105855
2 2.652000
3 0.762195
4 11.046000
Name: 2013 $, dtype: float64
In [75]:
# Unfortunately one is "n/a" which is going to break our code, so we can make n/a be 0
df['2013 $'].str.replace("$","").str.replace(",","").str.replace("n/a", "0").astype(int) / 1000000
Out[75]:
0 3.250000
1 10.105855
2 2.652000
3 0.762195
4 11.046000
5 1.246680
6 5.000000
7 5.016960
8 3.563600
9 6.500000
10 3.135000
11 5.625313
12 0.000000
13 12.868632
14 0.000000
15 0.792377
16 1.750000
17 13.701250
18 0.510000
19 0.884293
20 0.788872
21 21.466718
22 1.186459
23 0.884293
24 0.788872
25 0.788872
26 1.082520
27 4.469548
28 0.490180
29 1.399507
...
498 18.668431
499 8.000000
500 0.951463
501 10.361446
502 6.500000
503 1.160040
504 0.000000
505 0.884293
506 2.016000
507 2.678640
508 0.225479
509 0.788872
510 0.788872
511 2.317920
512 3.202920
513 0.854389
514 0.000000
515 0.884293
516 6.500000
517 6.500000
518 1.660257
519 5.225000
520 0.490180
521 0.490180
522 2.676000
523 0.900000
524 0.788872
525 0.854389
526 12.000000
527 2.162419
Name: 2013 $, dtype: float64
In [76]:
# Remove the .head() piece and save it back into the dataframe
df['millions'] = df['2013 $'].str.replace("$","").str.replace(",","").str.replace("n/a","0").astype(int) / 1000000
df.head()
Out[76]:
Name
Age
Team
POS
#
2013 $
Ht (In.)
WT
EXP
1st Year
DOB
School
City
State (Province, Territory, Etc..)
Country
Race
HS Only
feet
millions
0
Gee, Alonzo
26
Cavaliers
F
33
$3,250,000
78
219
4
2009
5/29/1987
Alabama
Riviera Beach, FL
Florida
US
Black
No
6.500000
3.250000
1
Wallace, Gerald
31
Celtics
F
45
$10,105,855
79
220
12
2001
7/23/1982
Alabama
Sylacauga, AL
Alabama
US
Black
No
6.583333
10.105855
2
Williams, Mo
30
Trail Blazers
G
25
$2,652,000
73
195
10
2003
12/19/1982
Alabama
Jackson, MS
Mississippi
US
Black
No
6.083333
2.652000
3
Gladness, Mickell
27
Magic
C
40
$762,195
83
220
2
2011
7/26/1986
Alabama A&M
Birmingham, AL
Alabama
US
Black
No
6.916667
0.762195
4
Jefferson, Richard
33
Jazz
F
44
$11,046,000
79
230
12
2001
6/21/1980
Arizona
Los Angeles, CA
California
US
Black
No
6.583333
11.046000
In [77]:
df.describe()
Out[77]:
Age
Ht (In.)
WT
EXP
1st Year
feet
millions
count
528.000000
528.000000
528.000000
528.000000
528.000000
528.000000
528.000000
mean
26.242424
79.119318
221.206439
4.772727
2008.227273
6.593277
3.818379
std
4.178868
3.431488
27.943169
4.325628
4.325628
0.285957
4.728437
min
18.000000
69.000000
20.000000
0.000000
1995.000000
5.750000
0.000000
25%
23.000000
77.000000
200.000000
1.000000
2005.000000
6.416667
0.816844
50%
25.000000
80.000000
220.000000
4.000000
2009.000000
6.666667
1.711620
75%
29.000000
82.000000
240.000000
8.000000
2012.000000
6.833333
5.000000
max
39.000000
87.000000
290.000000
18.000000
2013.000000
7.250000
30.453805
In [81]:
# This is just the first few guys in the dataset. Can we order it?
df.head(3)
Out[81]:
Name
Age
Team
POS
#
2013 $
Ht (In.)
WT
EXP
1st Year
DOB
School
City
State (Province, Territory, Etc..)
Country
Race
HS Only
feet
millions
0
Gee, Alonzo
26
Cavaliers
F
33
$3,250,000
78
219
4
2009
5/29/1987
Alabama
Riviera Beach, FL
Florida
US
Black
No
6.500000
3.250000
1
Wallace, Gerald
31
Celtics
F
45
$10,105,855
79
220
12
2001
7/23/1982
Alabama
Sylacauga, AL
Alabama
US
Black
No
6.583333
10.105855
2
Williams, Mo
30
Trail Blazers
G
25
$2,652,000
73
195
10
2003
12/19/1982
Alabama
Jackson, MS
Mississippi
US
Black
No
6.083333
2.652000
In [82]:
# Let's try to sort them
df.sort_values(by='millions').head(3)
Out[82]:
Name
Age
Team
POS
#
2013 $
Ht (In.)
WT
EXP
1st Year
DOB
School
City
State (Province, Territory, Etc..)
Country
Race
HS Only
feet
millions
496
Johnson, James
26
Hawks
F
13
n/a
81
248
4
2009
2/20/1987
Wake Forest
Cheyene, WY
Wyoming
US
Black
No
6.750000
0.0
33
Davies, Brandon
22
Clippers
F
23
n/a
81
235
0
2013
7/25/1991
Brigham Young
Provo, UT
Utah
US
Black
No
6.750000
0.0
465
Drew, Larry
23
Heat
G
0
n/a
74
180
0
2013
3/5/1990
UCLA
Encino, CA
California
US
Black
No
6.166667
0.0
Those guys are making nothing! If only there were a way to sort from high to low, a.k.a. descending instead of ascending.
In [84]:
# It isn't descending = True, unfortunately
df.sort_values(by='millions', ascending=False).head(3)
Out[84]:
Name
Age
Team
POS
#
2013 $
Ht (In.)
WT
EXP
1st Year
DOB
School
City
State (Province, Territory, Etc..)
Country
Race
HS Only
feet
millions
203
Bryant, Kobe
35
Lakers
G
24
$30,453,805
78
205
7
2006
8/23/1978
Lower Merion HS (PA)
Philadelphia, PA
Pennsylvania
US
Black
Yes
6.500000
30.453805
282
Nowitzki, Dirk
35
Mavericks
F
41
$22,721,381
84
245
15
1998
6/19/1978
n/a
Wurzburg, BA
Bavaria
Germany
White
No
7.000000
22.721381
68
Stoudemire, Amar'e†
30
Knicks
F/C
1
$21,679,893
83
245
11
2002
11/16/1982
Cypress Creek HS (FL)
Lake Wales, FL
Florida
US
Black
Yes
6.916667
21.679893
In [86]:
# We can use this to find the oldest guys in the league
df.sort_values(by='Age', ascending=False).head(3)
Out[86]:
Name
Age
Team
POS
#
2013 $
Ht (In.)
WT
EXP
1st Year
DOB
School
City
State (Province, Territory, Etc..)
Country
Race
HS Only
feet
millions
392
Nash, Steve
39
Lakers
G
10
$9,300,500
75
178
7
2006
2/7/1974
Santa Clara
Johannesburg, SA
n/a
South Africa
White
No
6.250000
9.300500
225
Camby, Marcus
39
Rockets
F/C
21
$884,293
83
240
17
1996
3/22/1974
Massachusetts
Hartford, CT
Connecticut
US
Black
No
6.916667
0.884293
23
Fisher, Derek
39
Thunder
G
6
$884,293
73
210
17
1996
8/9/1974
Arkansas-Little Rock
Little Rock, AR
Arkansas
US
Black
No
6.083333
0.884293
In [88]:
# Or the youngest, by taking out 'ascending=False'
df.sort_values(by='Age').head(3)
Out[88]:
Name
Age
Team
POS
#
2013 $
Ht (In.)
WT
EXP
1st Year
DOB
School
City
State (Province, Territory, Etc..)
Country
Race
HS Only
feet
millions
285
Antetokounmpo, Giannis
18
Bucks
G/F
34
$1,792,560
81
205
1
2012
12/16/1994
n/a
Athens
n/a
Greece
Black
No
6.750000
1.79256
174
Noel, Nerlens
19
76ers
C
4
$3,171,320
83
228
0
2013
4/10/1994
Kentucky
Malden, MA
Massachussetts
US
Black
No
6.916667
3.17132
191
Goodwin, Archie
19
Suns
G
20
$1,064,400
77
198
0
2013
8/17/1994
Kentucky
Little Rock, AR
Arkansas
US
Black
No
6.416667
1.06440
But sometimes instead of just looking at them, I want to do stuff with them. Play some games with them! Dunk on them~ describe them! And we don't want to dunk on everyone, only the players above 7 feet tall.
First, we need to check out boolean things.
In [89]:
# Get a big long list of True and False for every single row.
df['feet'] > 7
Out[89]:
0 False
1 False
2 False
3 False
4 False
5 False
6 False
7 False
8 False
9 False
10 False
11 False
12 False
13 False
14 False
15 False
16 False
17 False
18 False
19 False
20 False
21 False
22 False
23 False
24 False
25 False
26 False
27 False
28 False
29 False
...
498 False
499 False
500 False
501 False
502 True
503 False
504 False
505 False
506 False
507 False
508 False
509 False
510 False
511 False
512 False
513 False
514 False
515 False
516 False
517 False
518 False
519 False
520 False
521 False
522 False
523 False
524 False
525 False
526 False
527 False
Name: feet, dtype: bool
In [92]:
# We could use value counts if we wanted
above_seven_feet = df['feet'] > 7
above_seven_feet.value_counts()
Out[92]:
False 518
True 10
Name: feet, dtype: int64
In [94]:
# But we can also apply this to every single row to say whether YES we want it or NO we don't
df['feet'].head() > 7
Out[94]:
0 False
1 False
2 False
3 False
4 False
Name: feet, dtype: bool
In [96]:
# Instead of putting column names inside of the brackets, we instead
# put the True/False statements. It will only return the players above
# seven feet tall
df[df['feet'] > 7]
Out[96]:
Name
Age
Team
POS
#
2013 $
Ht (In.)
WT
EXP
1st Year
DOB
School
City
State (Province, Territory, Etc..)
Country
Race
HS Only
feet
millions
54
Thabeet, Hasheem
26
Thunder
C
34
$1,200,000
87
263
4
2009
2/16/1987
Connecticut
Dar es Salaam
n/a
Tanzania
Black
No
7.250000
1.200000
76
Chandler, Tyson
31
Knicks
C
6
$14,100,538
85
240
12
2001
10/2/1982
Dominguez HS (CA)
Hanford, CA
California
US
Black
Yes
7.083333
14.100538
120
Hibbert, Roy
26
Pacers
C
55
$14,283,844
86
280
5
2008
12/11/1986
Georgetown
New York City, NY
New York
US
Black
No
7.166667
14.283844
145
Leonard, Meyers
21
Trail Blazers
C
11
$2,222,160
85
245
1
2012
2/27/1992
Illinois
Robinson, IIL
Illinois
US
White
No
7.083333
2.222160
221
Len, Alex
20
Suns
C
21
$3,492,720
85
255
0
2013
6/16/1993
Maryland
Antratsy
n/a
Ukraine
White
No
7.083333
3.492720
274
Gobert, Rudy
21
Jazz
C
27
$1,078,800
85
235
0
2013
6/26/1992
n/a
Saint-Quentin
Aisne
France
Mixed
No
7.083333
1.078800
297
Mozgov, Timofey
27
Nuggets
C
25
$4,400,000
85
250
3
2010
7/16/1986
n/a
St. Petersburg
n/a
Russia
White
No
7.083333
4.400000
303
Gasol, Marc
28
Grizzlies
C
33
$14,860,524
85
265
5
2008
1/29/1985
n/a
Barcelona
n/a
Spain
Hispanic
No
7.083333
14.860524
316
Kuzmi?, Ognjen
23
Warriors
C
1
$490,180
85
231
0
2013
5/16/1990
n/a
Doboj
n/a
Yugoslavia
White
No
7.083333
0.490180
502
Hawes, Spencer
25
76ers
C
0
$6,500,000
85
245
6
2007
4/28/1988
Washington
Seattle, WA
Washington
US
White
No
7.083333
6.500000
In [98]:
# Or only the guards
df[df['POS'] == 'G']
Out[98]:
Name
Age
Team
POS
#
2013 $
Ht (In.)
WT
EXP
1st Year
DOB
School
City
State (Province, Territory, Etc..)
Country
Race
HS Only
feet
millions
2
Williams, Mo
30
Trail Blazers
G
25
$2,652,000
73
195
10
2003
12/19/1982
Alabama
Jackson, MS
Mississippi
US
Black
No
6.083333
2.652000
10
Bayless, Jerryd
25
Grizzlies
G
7
$3,135,000
75
200
5
2008
8/20/1988
Arizona
Phoenix, AZ
Arizona
US
Black
No
6.250000
3.135000
11
Terry, Jason
36
Nets
G
31
$5,625,313
74
180
14
1999
9/15/1977
Arizona
Seattle, WA
Washington
US
Black
No
6.166667
5.625313
12
Fogg, Kyle
23
Nuggets
G
6
n/a
75
183
0
2013
1/27/1990
Arizona
Brea, CA
California
US
Black
No
6.250000
0.000000
17
Harden, James
24
Rockets
G
13
$13,701,250
77
220
4
2009
8/26/1989
Arizona State
Los Angeles, CA
California
US
Black
No
6.416667
13.701250
19
Pargo, Jannero
33
Bobcats
G
5
$884,293
73
185
11
2002
10/22/1979
Arkansas
Chicago, IL
Illinois
US
Black
No
6.083333
0.884293
20
Beverley, Patrick
25
Rockets
G
2
$788,872
73
185
5
2008
7/12/1988
Arkansas
Chicago, IL
Illinois
US
Black
No
6.083333
0.788872
23
Fisher, Derek
39
Thunder
G
6
$884,293
73
210
17
1996
8/9/1974
Arkansas-Little Rock
Little Rock, AR
Arkansas
US
Black
No
6.083333
0.884293
28
Clark, Ian
22
Jazz
G
21
$490,180
75
175
0
2013
3/7/1991
Belmont
Memphis, TN
Tennessee
US
Black
No
6.250000
0.490180
30
Jackson, Reggie
23
Thunder
G
15
$1,260,360
75
208
2
2011
4/16/1990
Boston College
Pordenone
n/a
Italy
Black
No
6.250000
1.260360
34
Fredette, Jimmer
24
Kings
G
7
$2,439,840
74
195
2
2011
2/25/1989
Brigham Young
Glens Falls, NY
New York
US
White
No
6.166667
2.439840
35
Mack, Shelvin
23
Hawks
G
8
$884,293
75
215
2
2011
4/22/1990
Butler
Lexington, KY
Kentucky
US
Black
No
6.250000
0.884293
38
Crabbe, Allen
21
Trail Blazers
G
23
$825,000
78
210
0
2013
4/4/1992
California
Los Angeles, CA
California
US
Black
No
6.500000
0.825000
40
Taylor, Jermaine
26
Cavaliers
G
8
$780,871
77
20
4
2009
12/8/1986
Central Florida
Tavares, FL
Florida
US
Black
No
6.416667
0.780871
44
Stephenson, Lance
23
Pacers
G
1
$1,005,000
77
228
3
2010
9/5/1990
Cincinnati
New York City, NY
New York
US
Black
No
6.416667
1.005000
46
Cole, Norris
25
Heat
G
30
$1,129,200
74
175
2
2011
10/13/1988
Cleveland State
Dayton, OH
Ohio
US
Black
No
6.166667
1.129200
49
Burks, Alec
22
Jazz
G
10
$2,202,000
78
205
2
2011
7/20/1991
Colorado
Grandview, MO
Missouri
US
Black
No
6.500000
2.202000
50
Billups, Chauncey
37
Pistons
G
1
$2,500,000
75
210
16
1997
9/25/1976
Colorado
Denver, CO
Colorado
US
Black
No
6.250000
2.500000
53
Gordon, Ben
30
Bobcats
G
8
$13,200,000
75
200
9
2004
4/4/1983
Connecticut
London, ENG
n/a
England
Black
No
6.250000
13.200000
62
Walker, Kemba
23
Bobcats
G
15
$2,568,360
73
184
2
2011
5/8/1990
Connecticut
New York City, NY
New York
US
Black
No
6.083333
2.568360
63
Allen, Ray
38
Heat
G
34
$3,229,050
77
205
17
1996
7/20/1975
Connecticut
Merced, CA
California
US
Black
No
6.416667
3.229050
64
Price, A.J.
27
Timberwolves
G
22
n/a
74
185
4
2009
10/7/1986
Connecticut
Orange, NJ
New Jersey
Us
Black
No
6.166667
0.000000
69
Curry, Stephen
25
Warriors
G
30
$9,887,640
75
185
4
2009
3/14/1988
Davidson
Akron, OH
Ohio
US
Mixed
No
6.250000
9.887640
71
Roberts, Brian
27
Pelicans
G
22
$788,872
73
173
1
2012
12/3/1985
Dayton
Toledo, OH
Ohio
US
Black
No
6.083333
0.788872
74
Green, Willie
32
Clippers
G
34
$1,399,507
75
201
10
2003
7/28/1981
Detroit
Detroit, MI
Michigan
US
Black
No
6.250000
1.399507
75
McCallum, Ray
22
Kings
G
3
$524,616
75
190
0
2013
6/12/1991
Detroit
Detroit, MI
Michigan
US
Black
No
6.250000
0.524616
77
Irving, Kyrie
21
Cavaliers
G
2
$5,607,240
75
191
2
2011
3/23/1992
Duke
Melbourne
Victoria
Australia
Black
No
6.250000
5.607240
88
Redick, J. J.
29
Clippers
G
4
$6,500,000
76
190
7
2006
6/24/1984
Duke
Cookeville, TN
Tennessee
US
White
No
6.333333
6.500000
89
Rivers, Austin
21
Pelicans
G
25
$2,339,040
76
200
1
2012
8/1/1992
Duke
Santa Monica, CA
California
US
Mixed
No
6.333333
2.339040
90
Curry, Seth
23
Warriors
G
3
$490,180
74
185
0
2013
8/23/1990
Duke
Charlotte, NC
North Carolina
US
Black
No
6.166667
0.490180
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
457
Johnson, Orlando
24
Pacers
G
11
$788,872
77
220
1
2012
3/11/1989
UC Santa Barbara
Monterey, CA
California
US
Black
No
6.416667
0.788872
464
Collison, Darren
26
Clippers
G
2
$1,900,000
72
175
4
2009
8/23/1987
UCLA
Rancho Cucamonga, CA
California
US
Black
No
6.000000
1.900000
465
Drew, Larry
23
Heat
G
0
n/a
74
180
0
2013
3/5/1990
UCLA
Encino, CA
California
US
Black
No
6.166667
0.000000
466
Farmar, Jordan
26
Lakers
G
1
$884,293
74
180
12
2001
11/30/1986
UCLA
Los Angeles, CA
California
US
Mixed
No
6.166667
0.884293
467
Holiday, Jrue
23
Pelicans
G
11
$9,713,484
76
205
4
2009
6/12/1990
UCLA
Chatsworth, CA
California
US
Black
No
6.333333
9.713484
468
Lee, Malcolm
23
Suns
G
30
$884,293
77
200
2
2011
5/22/1990
UCLA
Riverside, CA
California
US
Black
No
6.416667
0.884293
469
Westbrook, Russell
24
Thunder
G
0
$14,693,906
75
187
5
2008
11/12/1988
UCLA
Long Beach, CA
California
US
Black
No
6.250000
14.693906
470
Watson, Earl
34
Trail Blazers
G
17
$884,293
73
199
12
2001
6/12/1979
UCLA
Kansas City, KA
Kansas
US
Black
No
6.083333
0.884293
480
Miller, Andre
37
Nuggets
G
24
$5,000,000
74
200
14
1999
3/19/1976
Utah
Los Angeles, CA
California
US
Black
No
6.166667
5.000000
481
Price, Ronnie
30
Magic
G
10
$1,146,337
74
190
8
2005
6/21/1983
Utah Valley
Friendswood, Texas
Texas
US
Black
No
6.166667
1.146337
485
Jenkins, John
22
Hawks
G
12
$1,258,800
76
215
1
2012
3/6/1991
Vanderbilt
Hendersonville, TN
Tennessee
US
Black
No
6.333333
1.258800
487
Wayns, Maalik
22
Clippers
G
5
$788,872
73
195
1
2012
5/2/1991
Villanova
Philadelphia, PA
Pennsylvania
US
Black
No
6.083333
0.788872
488
Foye, Randy
30
Nuggets
G
4
$3,000,000
76
213
7
2006
9/24/1983
Villanova
Newark, NJ
New Jersey
US
Black
No
6.333333
3.000000
489
Lowry, Kyle
27
Raptors
G
7
$6,210,000
72
205
7
2006
3/25/1986
Villanova
Philadelphia, PA
Pennsylvania
US
Black
No
6.000000
6.210000
491
Mason, Jr., Roger
33
Heat
G
21
$854,389
77
205
11
2002
9/10/1980
Virginia
Washington, DC
DC
US
Black
No
6.416667
0.854389
493
Daniels, Troy
22
Bobcats
G
30
n/a
76
200
0
2013
7/15/1991
Virginia Commonwealth
Roanoke, VA
Virginia
US
Black
No
6.333333
0.000000
494
Maynor, Eric
26
Wizards
G
6
$13,000,000
75
175
4
2009
6/11/1987
Virginia Commonwealth
Raeford, NC
North Carolina
US
Black
No
6.250000
13.000000
498
Paul, Chris
28
Clippers
G
3
$18,668,431
72
175
8
2005
5/6/1985
Wake Forest
Forsyth County, NC
North Carolina
US
Black
No
6.000000
18.668431
499
Teague, Jeff
25
Hawks
G
0
$8,000,000
74
181
4
2009
6/10/1988
Wake Forest
Indianapolis, IN
Indiana
US
Black
No
6.166667
8.000000
500
Smith, Ish
25
Suns
G
30
$951,463
72
175
3
2010
7/5/1988
Wake Forest
Charlotte, NC
North Carolina
US
Black
No
6.000000
0.951463
503
Wroten, Tony
20
76ers
G
8
$1,160,040
78
205
1
2012
4/13/1993
Washington
Renton, WA
Washington
US
Black
No
6.500000
1.160040
504
Gaddy, Abdul
21
Bobcats
G
10
n/a
75
185
0
2013
1/26/1992
Washington
Tacoma, WA
Washington
US
Black
No
6.250000
0.000000
505
Thomas, Isaiah
24
Kings
G
22
$884,293
69
185
2
2011
2/7/1989
Washington
Tacoma, WA
Washington
US
Black
No
5.750000
0.884293
506
Robinson, Nate
29
Nuggets
G
10
$2,016,000
69
180
8
2005
5/31/1984
Washington
Seattle, WA
Washington
US
Black
No
5.750000
2.016000
507
Ross, Terrence
22
Raptors
G
31
$2,678,640
78
195
1
2012
2/5/1991
Washington
Portland, OR
Oregon
US
Black
No
6.500000
2.678640
512
Lillard, Damian
23
Trail Blazers
G
0
$3,202,920
75
195
1
2012
7/15/1990
Weber State
Oakland, CA
California
US
Black
No
6.250000
3.202920
517
Martin, Kevin
30
Timberwolves
G
23
$6,500,000
79
185
9
2004
2/1/1983
Western Carolina
Zanesville, OH
Ohio
US
Mixed
No
6.583333
6.500000
520
Mekel, Gal
25
Mavericks
G
33
$490,180
75
191
5
2008
3/4/1988
Wichita State
Petah Tikva
n/a
Israel
White
No
6.250000
0.490180
525
Harris, Devin
30
Mavericks
G
20
$854,389
75
192
9
2004
2/27/1983
Wisconsin
Milwaukee, WI
Wisconsin
US
Black
No
6.250000
0.854389
527
Crawford, Jordan
24
Celtics
G
27
$2,162,419
76
195
3
2010
10/23/1988
Xavier
Detroit, MI
Michigan
US
Black
No
6.333333
2.162419
175 rows × 19 columns
In [108]:
# Or only the guards who make more than 15 million
df[(df['POS'] == 'G') & (df['millions'] > 15)]
Out[108]:
Name
Age
Team
POS
#
2013 $
Ht (In.)
WT
EXP
1st Year
DOB
School
City
State (Province, Territory, Etc..)
Country
Race
HS Only
feet
millions
147
Williams, Deron
29
Nets
G
8
$18,466,130
75
209
8
2005
6/26/1984
Illinois
Parkersburg, WV
West Virginia
US
Black
No
6.250000
18.466130
203
Bryant, Kobe
35
Lakers
G
24
$30,453,805
78
205
7
2006
8/23/1978
Lower Merion HS (PA)
Philadelphia, PA
Pennsylvania
US
Black
Yes
6.500000
30.453805
214
Wade, Dwyane
31
Heat
G
3
$18,673,000
76
220
10
2003
1/17/1982
Marquette
Chicago, IL
Illinois
US
Black
No
6.333333
18.673000
227
Rose, Derrick
25
Bulls
G
1
$17,632,688
75
190
5
2008
10/4/1988
Memphis
Chicago, IL
Illinois
US
Black
No
6.250000
17.632688
498
Paul, Chris
28
Clippers
G
3
$18,668,431
72
175
8
2005
5/6/1985
Wake Forest
Forsyth County, NC
North Carolina
US
Black
No
6.000000
18.668431
In [110]:
# It might be easier to break down the booleans into separate variables
is_guard = df['POS'] == 'G'
more_than_fifteen_million = df['millions'] > 15
df[is_guard & more_than_fifteen_million]
Out[110]:
Name
Age
Team
POS
#
2013 $
Ht (In.)
WT
EXP
1st Year
DOB
School
City
State (Province, Territory, Etc..)
Country
Race
HS Only
feet
millions
147
Williams, Deron
29
Nets
G
8
$18,466,130
75
209
8
2005
6/26/1984
Illinois
Parkersburg, WV
West Virginia
US
Black
No
6.250000
18.466130
203
Bryant, Kobe
35
Lakers
G
24
$30,453,805
78
205
7
2006
8/23/1978
Lower Merion HS (PA)
Philadelphia, PA
Pennsylvania
US
Black
Yes
6.500000
30.453805
214
Wade, Dwyane
31
Heat
G
3
$18,673,000
76
220
10
2003
1/17/1982
Marquette
Chicago, IL
Illinois
US
Black
No
6.333333
18.673000
227
Rose, Derrick
25
Bulls
G
1
$17,632,688
75
190
5
2008
10/4/1988
Memphis
Chicago, IL
Illinois
US
Black
No
6.250000
17.632688
498
Paul, Chris
28
Clippers
G
3
$18,668,431
72
175
8
2005
5/6/1985
Wake Forest
Forsyth County, NC
North Carolina
US
Black
No
6.000000
18.668431
In [118]:
# We can save this stuff
short_players = df[df['feet'] < 6.5]
short_players
Out[118]:
Name
Age
Team
POS
#
2013 $
Ht (In.)
WT
EXP
1st Year
DOB
School
City
State (Province, Territory, Etc..)
Country
Race
HS Only
feet
millions
2
Williams, Mo
30
Trail Blazers
G
25
$2,652,000
73
195
10
2003
12/19/1982
Alabama
Jackson, MS
Mississippi
US
Black
No
6.083333
2.652000
10
Bayless, Jerryd
25
Grizzlies
G
7
$3,135,000
75
200
5
2008
8/20/1988
Arizona
Phoenix, AZ
Arizona
US
Black
No
6.250000
3.135000
11
Terry, Jason
36
Nets
G
31
$5,625,313
74
180
14
1999
9/15/1977
Arizona
Seattle, WA
Washington
US
Black
No
6.166667
5.625313
12
Fogg, Kyle
23
Nuggets
G
6
n/a
75
183
0
2013
1/27/1990
Arizona
Brea, CA
California
US
Black
No
6.250000
0.000000
17
Harden, James
24
Rockets
G
13
$13,701,250
77
220
4
2009
8/26/1989
Arizona State
Los Angeles, CA
California
US
Black
No
6.416667
13.701250
19
Pargo, Jannero
33
Bobcats
G
5
$884,293
73
185
11
2002
10/22/1979
Arkansas
Chicago, IL
Illinois
US
Black
No
6.083333
0.884293
20
Beverley, Patrick
25
Rockets
G
2
$788,872
73
185
5
2008
7/12/1988
Arkansas
Chicago, IL
Illinois
US
Black
No
6.083333
0.788872
23
Fisher, Derek
39
Thunder
G
6
$884,293
73
210
17
1996
8/9/1974
Arkansas-Little Rock
Little Rock, AR
Arkansas
US
Black
No
6.083333
0.884293
28
Clark, Ian
22
Jazz
G
21
$490,180
75
175
0
2013
3/7/1991
Belmont
Memphis, TN
Tennessee
US
Black
No
6.250000
0.490180
30
Jackson, Reggie
23
Thunder
G
15
$1,260,360
75
208
2
2011
4/16/1990
Boston College
Pordenone
n/a
Italy
Black
No
6.250000
1.260360
34
Fredette, Jimmer
24
Kings
G
7
$2,439,840
74
195
2
2011
2/25/1989
Brigham Young
Glens Falls, NY
New York
US
White
No
6.166667
2.439840
35
Mack, Shelvin
23
Hawks
G
8
$884,293
75
215
2
2011
4/22/1990
Butler
Lexington, KY
Kentucky
US
Black
No
6.250000
0.884293
40
Taylor, Jermaine
26
Cavaliers
G
8
$780,871
77
20
4
2009
12/8/1986
Central Florida
Tavares, FL
Florida
US
Black
No
6.416667
0.780871
44
Stephenson, Lance
23
Pacers
G
1
$1,005,000
77
228
3
2010
9/5/1990
Cincinnati
New York City, NY
New York
US
Black
No
6.416667
1.005000
46
Cole, Norris
25
Heat
G
30
$1,129,200
74
175
2
2011
10/13/1988
Cleveland State
Dayton, OH
Ohio
US
Black
No
6.166667
1.129200
50
Billups, Chauncey
37
Pistons
G
1
$2,500,000
75
210
16
1997
9/25/1976
Colorado
Denver, CO
Colorado
US
Black
No
6.250000
2.500000
53
Gordon, Ben
30
Bobcats
G
8
$13,200,000
75
200
9
2004
4/4/1983
Connecticut
London, ENG
n/a
England
Black
No
6.250000
13.200000
62
Walker, Kemba
23
Bobcats
G
15
$2,568,360
73
184
2
2011
5/8/1990
Connecticut
New York City, NY
New York
US
Black
No
6.083333
2.568360
63
Allen, Ray
38
Heat
G
34
$3,229,050
77
205
17
1996
7/20/1975
Connecticut
Merced, CA
California
US
Black
No
6.416667
3.229050
64
Price, A.J.
27
Timberwolves
G
22
n/a
74
185
4
2009
10/7/1986
Connecticut
Orange, NJ
New Jersey
Us
Black
No
6.166667
0.000000
65
Lamb, Jeremy
21
Thunder
G/F
11
$2,111,160
77
180
1
2012
5/30/1992
Connecticut
Norcross, GA
Georgia
US
Black
No
6.416667
2.111160
69
Curry, Stephen
25
Warriors
G
30
$9,887,640
75
185
4
2009
3/14/1988
Davidson
Akron, OH
Ohio
US
Mixed
No
6.250000
9.887640
71
Roberts, Brian
27
Pelicans
G
22
$788,872
73
173
1
2012
12/3/1985
Dayton
Toledo, OH
Ohio
US
Black
No
6.083333
0.788872
74
Green, Willie
32
Clippers
G
34
$1,399,507
75
201
10
2003
7/28/1981
Detroit
Detroit, MI
Michigan
US
Black
No
6.250000
1.399507
75
McCallum, Ray
22
Kings
G
3
$524,616
75
190
0
2013
6/12/1991
Detroit
Detroit, MI
Michigan
US
Black
No
6.250000
0.524616
77
Irving, Kyrie
21
Cavaliers
G
2
$5,607,240
75
191
2
2011
3/23/1992
Duke
Melbourne
Victoria
Australia
Black
No
6.250000
5.607240
88
Redick, J. J.
29
Clippers
G
4
$6,500,000
76
190
7
2006
6/24/1984
Duke
Cookeville, TN
Tennessee
US
White
No
6.333333
6.500000
89
Rivers, Austin
21
Pelicans
G
25
$2,339,040
76
200
1
2012
8/1/1992
Duke
Santa Monica, CA
California
US
Mixed
No
6.333333
2.339040
90
Curry, Seth
23
Warriors
G
3
$490,180
74
185
0
2013
8/23/1990
Duke
Charlotte, NC
North Carolina
US
Black
No
6.166667
0.490180
91
Henderson, Gerald
25
Bobcats
G/F
9
$6,000,000
77
215
4
2009
12/9/1987
Duke
Caldwell, NJ
New Jersey
US
Black
No
6.416667
6.000000
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
464
Collison, Darren
26
Clippers
G
2
$1,900,000
72
175
4
2009
8/23/1987
UCLA
Rancho Cucamonga, CA
California
US
Black
No
6.000000
1.900000
465
Drew, Larry
23
Heat
G
0
n/a
74
180
0
2013
3/5/1990
UCLA
Encino, CA
California
US
Black
No
6.166667
0.000000
466
Farmar, Jordan
26
Lakers
G
1
$884,293
74
180
12
2001
11/30/1986
UCLA
Los Angeles, CA
California
US
Mixed
No
6.166667
0.884293
467
Holiday, Jrue
23
Pelicans
G
11
$9,713,484
76
205
4
2009
6/12/1990
UCLA
Chatsworth, CA
California
US
Black
No
6.333333
9.713484
468
Lee, Malcolm
23
Suns
G
30
$884,293
77
200
2
2011
5/22/1990
UCLA
Riverside, CA
California
US
Black
No
6.416667
0.884293
469
Westbrook, Russell
24
Thunder
G
0
$14,693,906
75
187
5
2008
11/12/1988
UCLA
Long Beach, CA
California
US
Black
No
6.250000
14.693906
470
Watson, Earl
34
Trail Blazers
G
17
$884,293
73
199
12
2001
6/12/1979
UCLA
Kansas City, KA
Kansas
US
Black
No
6.083333
0.884293
471
Afflalo, Arron
27
Magic
G/F
4
$7,750,000
77
215
6
2007
10/15/1985
UCLA
Los Angeles, CA
California
US
Black
No
6.416667
7.750000
480
Miller, Andre
37
Nuggets
G
24
$5,000,000
74
200
14
1999
3/19/1976
Utah
Los Angeles, CA
California
US
Black
No
6.166667
5.000000
481
Price, Ronnie
30
Magic
G
10
$1,146,337
74
190
8
2005
6/21/1983
Utah Valley
Friendswood, Texas
Texas
US
Black
No
6.166667
1.146337
482
Howard, Ron
31
Pacers
G/F
19
n/a
77
200
0
2013
1/14/1982
Valparaiso
Chicago, IL
Illinois
US
Black
No
6.416667
0.000000
485
Jenkins, John
22
Hawks
G
12
$1,258,800
76
215
1
2012
3/6/1991
Vanderbilt
Hendersonville, TN
Tennessee
US
Black
No
6.333333
1.258800
487
Wayns, Maalik
22
Clippers
G
5
$788,872
73
195
1
2012
5/2/1991
Villanova
Philadelphia, PA
Pennsylvania
US
Black
No
6.083333
0.788872
488
Foye, Randy
30
Nuggets
G
4
$3,000,000
76
213
7
2006
9/24/1983
Villanova
Newark, NJ
New Jersey
US
Black
No
6.333333
3.000000
489
Lowry, Kyle
27
Raptors
G
7
$6,210,000
72
205
7
2006
3/25/1986
Villanova
Philadelphia, PA
Pennsylvania
US
Black
No
6.000000
6.210000
491
Mason, Jr., Roger
33
Heat
G
21
$854,389
77
205
11
2002
9/10/1980
Virginia
Washington, DC
DC
US
Black
No
6.416667
0.854389
493
Daniels, Troy
22
Bobcats
G
30
n/a
76
200
0
2013
7/15/1991
Virginia Commonwealth
Roanoke, VA
Virginia
US
Black
No
6.333333
0.000000
494
Maynor, Eric
26
Wizards
G
6
$13,000,000
75
175
4
2009
6/11/1987
Virginia Commonwealth
Raeford, NC
North Carolina
US
Black
No
6.250000
13.000000
498
Paul, Chris
28
Clippers
G
3
$18,668,431
72
175
8
2005
5/6/1985
Wake Forest
Forsyth County, NC
North Carolina
US
Black
No
6.000000
18.668431
499
Teague, Jeff
25
Hawks
G
0
$8,000,000
74
181
4
2009
6/10/1988
Wake Forest
Indianapolis, IN
Indiana
US
Black
No
6.166667
8.000000
500
Smith, Ish
25
Suns
G
30
$951,463
72
175
3
2010
7/5/1988
Wake Forest
Charlotte, NC
North Carolina
US
Black
No
6.000000
0.951463
504
Gaddy, Abdul
21
Bobcats
G
10
n/a
75
185
0
2013
1/26/1992
Washington
Tacoma, WA
Washington
US
Black
No
6.250000
0.000000
505
Thomas, Isaiah
24
Kings
G
22
$884,293
69
185
2
2011
2/7/1989
Washington
Tacoma, WA
Washington
US
Black
No
5.750000
0.884293
506
Robinson, Nate
29
Nuggets
G
10
$2,016,000
69
180
8
2005
5/31/1984
Washington
Seattle, WA
Washington
US
Black
No
5.750000
2.016000
512
Lillard, Damian
23
Trail Blazers
G
0
$3,202,920
75
195
1
2012
7/15/1990
Weber State
Oakland, CA
California
US
Black
No
6.250000
3.202920
519
Lee, Courtney
28
Celtics
G/F
11
$5,225,000
77
200
5
2008
10/3/1985
Western Kentucky
Indianapolis, IN
Indiana
US
Black
No
6.416667
5.225000
520
Mekel, Gal
25
Mavericks
G
33
$490,180
75
191
5
2008
3/4/1988
Wichita State
Petah Tikva
n/a
Israel
White
No
6.250000
0.490180
521
Murry, Toure'
23
Knicks
G/F
23
$490,180
77
195
0
2013
11/8/1989
Wichita State
Houston, TX
Texas
US
Black
No
6.416667
0.490180
525
Harris, Devin
30
Mavericks
G
20
$854,389
75
192
9
2004
2/27/1983
Wisconsin
Milwaukee, WI
Wisconsin
US
Black
No
6.250000
0.854389
527
Crawford, Jordan
24
Celtics
G
27
$2,162,419
76
195
3
2010
10/23/1988
Xavier
Detroit, MI
Michigan
US
Black
No
6.333333
2.162419
166 rows × 19 columns
In [119]:
short_players.describe()
Out[119]:
Age
Ht (In.)
WT
EXP
1st Year
feet
millions
count
166.000000
166.000000
166.000000
166.000000
166.000000
166.000000
166.000000
mean
25.933735
74.909639
193.530120
4.168675
2008.831325
6.242470
3.423839
std
4.286887
1.778056
19.085668
4.059614
4.059614
0.148171
4.122675
min
19.000000
69.000000
20.000000
0.000000
1996.000000
5.750000
0.000000
25%
23.000000
74.000000
185.000000
1.000000
2006.000000
6.166667
0.788872
50%
25.000000
75.000000
195.000000
3.000000
2010.000000
6.250000
1.595675
75%
28.000000
76.000000
205.000000
7.000000
2012.000000
6.333333
4.940940
max
39.000000
77.000000
228.000000
17.000000
2013.000000
6.416667
18.673000
In [121]:
# Maybe we can compare them to taller players?
df[df['feet'] >= 6.5].describe()
Out[121]:
Age
Ht (In.)
WT
EXP
1st Year
feet
millions
count
362.000000
362.000000
362.000000
362.000000
362.000000
362.000000
362.000000
mean
26.383978
81.049724
233.897790
5.049724
2007.950276
6.754144
3.999301
std
4.126674
1.964438
21.439163
4.420146
4.420146
0.163703
4.976573
min
18.000000
78.000000
155.000000
0.000000
1995.000000
6.500000
0.000000
25%
23.000000
79.000000
220.000000
1.000000
2005.000000
6.583333
0.854389
50%
26.000000
81.000000
235.000000
4.000000
2009.000000
6.750000
1.750000
75%
29.000000
83.000000
250.000000
8.000000
2012.000000
6.916667
5.012720
max
39.000000
87.000000
290.000000
18.000000
2013.000000
7.250000
30.453805
In [123]:
df['Age'].head()
Out[123]:
0 26
1 31
2 30
3 27
4 33
Name: Age, dtype: int64
In [124]:
# This will scream we don't have matplotlib.
df['Age'].hist()
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
<ipython-input-124-694adadca099> in <module>()
----> 1 df['Age'].hist()
/Users/soma/.virtualenvs/pandas-intro/lib/python3.4/site-packages/pandas/tools/plotting.py in hist_series(self, by, ax, grid, xlabelsize, xrot, ylabelsize, yrot, figsize, bins, **kwds)
2941
2942 """
-> 2943 import matplotlib.pyplot as plt
2944
2945 if by is None:
ImportError: No module named 'matplotlib'
matplotlib is a graphing library. It's the Python way to make graphs!
In [126]:
!pip install matplotlib
Collecting matplotlib
Using cached matplotlib-1.5.1-cp34-cp34m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Collecting cycler (from matplotlib)
Using cached cycler-0.10.0-py2.py3-none-any.whl
Requirement already satisfied (use --upgrade to upgrade): numpy>=1.6 in /Users/soma/.virtualenvs/pandas-intro/lib/python3.4/site-packages (from matplotlib)
Requirement already satisfied (use --upgrade to upgrade): pytz in /Users/soma/.virtualenvs/pandas-intro/lib/python3.4/site-packages (from matplotlib)
Requirement already satisfied (use --upgrade to upgrade): python-dateutil in /Users/soma/.virtualenvs/pandas-intro/lib/python3.4/site-packages (from matplotlib)
Collecting pyparsing!=2.0.0,!=2.0.4,>=1.5.6 (from matplotlib)
Using cached pyparsing-2.1.4-py2.py3-none-any.whl
Requirement already satisfied (use --upgrade to upgrade): six in /Users/soma/.virtualenvs/pandas-intro/lib/python3.4/site-packages (from cycler->matplotlib)
Installing collected packages: cycler, pyparsing, matplotlib
Successfully installed cycler-0.10.0 matplotlib-1.5.1 pyparsing-2.1.4
In [127]:
# this will open up a weird window that won't do anything
df['Age'].hist()
Out[127]:
<matplotlib.axes._subplots.AxesSubplot at 0x108d38780>
In [128]:
# So instead you run this code
%matplotlib inline
In [129]:
df['Age'].hist()
Out[129]:
<matplotlib.axes._subplots.AxesSubplot at 0x10d724dd8>
But that's ugly. There's a thing called ggplot for R that looks nice. We want to look nice. We want to look like ggplot.
In [130]:
import matplotlib.pyplot as plt
plt.style.available
Out[130]:
['grayscale',
'seaborn-muted',
'seaborn-paper',
'classic',
'seaborn-notebook',
'seaborn-white',
'seaborn-pastel',
'fivethirtyeight',
'seaborn-dark-palette',
'seaborn-ticks',
'seaborn-poster',
'seaborn-talk',
'seaborn-whitegrid',
'seaborn-deep',
'ggplot',
'dark_background',
'seaborn-bright',
'bmh',
'seaborn-darkgrid',
'seaborn-dark',
'seaborn-colorblind']
In [131]:
plt.style.use('ggplot')
In [132]:
df['Age'].hist()
Out[132]:
<matplotlib.axes._subplots.AxesSubplot at 0x10d73f5c0>
In [133]:
plt.style.use('seaborn-deep')
df['Age'].hist()
Out[133]:
<matplotlib.axes._subplots.AxesSubplot at 0x108e5fef0>
In [134]:
plt.style.use('fivethirtyeight')
df['Age'].hist()
Out[134]:
<matplotlib.axes._subplots.AxesSubplot at 0x108f48978>
That might look better with a little more customization. So let's customize it.
In [143]:
# Pass in all sorts of stuff!
# Most from http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.hist.html
# .range() is a matplotlib thing
df['Age'].hist(bins=20, xlabelsize=10, ylabelsize=10, range=(0,40))
Out[143]:
<matplotlib.axes._subplots.AxesSubplot at 0x10e73e358>
I want more graphics! Do tall people make more money?!?!
In [149]:
df.plot(kind='scatter', x='feet', y='millions')
Out[149]:
<matplotlib.axes._subplots.AxesSubplot at 0x110193320>
In [150]:
df.head()
Out[150]:
Name
Age
Team
POS
#
2013 $
Ht (In.)
WT
EXP
1st Year
DOB
School
City
State (Province, Territory, Etc..)
Country
Race
HS Only
feet
millions
0
Gee, Alonzo
26
Cavaliers
F
33
$3,250,000
78
219
4
2009
5/29/1987
Alabama
Riviera Beach, FL
Florida
US
Black
No
6.500000
3.250000
1
Wallace, Gerald
31
Celtics
F
45
$10,105,855
79
220
12
2001
7/23/1982
Alabama
Sylacauga, AL
Alabama
US
Black
No
6.583333
10.105855
2
Williams, Mo
30
Trail Blazers
G
25
$2,652,000
73
195
10
2003
12/19/1982
Alabama
Jackson, MS
Mississippi
US
Black
No
6.083333
2.652000
3
Gladness, Mickell
27
Magic
C
40
$762,195
83
220
2
2011
7/26/1986
Alabama A&M
Birmingham, AL
Alabama
US
Black
No
6.916667
0.762195
4
Jefferson, Richard
33
Jazz
F
44
$11,046,000
79
230
12
2001
6/21/1980
Arizona
Los Angeles, CA
California
US
Black
No
6.583333
11.046000
In [152]:
# How does experience relate with the amount of money they're making?
df.plot(kind='scatter', x='EXP', y='millions')
Out[152]:
<matplotlib.axes._subplots.AxesSubplot at 0x11111e048>
In [153]:
# At least we can assume height and weight are related
df.plot(kind='scatter', x='WT', y='feet')
Out[153]:
<matplotlib.axes._subplots.AxesSubplot at 0x110f31278>
In [157]:
# At least we can assume height and weight are related
# http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.plot.html
df.plot(kind='scatter', x='WT', y='feet', xlim=(100,300), ylim=(5.5, 8))
Out[157]:
<matplotlib.axes._subplots.AxesSubplot at 0x1121c4208>
In [160]:
plt.style.use('ggplot')
In [161]:
df.plot(kind='scatter', x='WT', y='feet', xlim=(100,300), ylim=(5.5, 8))
Out[161]:
<matplotlib.axes._subplots.AxesSubplot at 0x112755518>
In [177]:
# We can also use plt separately
# It's SIMILAR but TOTALLY DIFFERENT
centers = df[df['POS'] == 'C']
guards = df[df['POS'] == 'G']
forwards = df[df['POS'] == 'F']
plt.scatter(y=centers["feet"], x=centers["WT"], c='c', alpha=0.75, marker='x')
plt.scatter(y=guards["feet"], x=guards["WT"], c='y', alpha=0.75, marker='o')
plt.scatter(y=forwards["feet"], x=forwards["WT"], c='m', alpha=0.75, marker='v')
plt.xlim(100,300)
plt.ylim(5.5,8)
Out[177]:
(5.5, 8)
In [ ]:
Content source: barjacks/foundations-homework
Similar notebooks: