First a little setup. Importing the pandas library as pd
In [1]:
import pandas as pd
Set some helpful display options. Uncomment the boilerplate in this cell.
In [2]:
%matplotlib inline
pd.set_option("max_columns", 150)
pd.set_option('max_colwidth',40)
pd.options.display.float_format = '{:,.2f}'.format
open and read in the Master.csv and Salaries.csv tables in the data/2017/
directory
In [3]:
master = pd.read_csv('../project3/data/2017/Master.csv') # File with player details
salary = pd.read_csv('../project3/data/2017/Salaries.csv') #File with baseball players' salaries
check to see what type each object is with print(table_name)
. You can also use the .info()
method to explore the data's structure.
In [4]:
master.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18846 entries, 0 to 18845
Data columns (total 24 columns):
playerID 18846 non-null object
birthYear 18703 non-null float64
birthMonth 18531 non-null float64
birthDay 18382 non-null float64
birthCountry 18773 non-null object
birthState 18220 non-null object
birthCity 18647 non-null object
deathYear 9336 non-null float64
deathMonth 9335 non-null float64
deathDay 9334 non-null float64
deathCountry 9329 non-null object
deathState 9277 non-null object
deathCity 9325 non-null object
nameFirst 18807 non-null object
nameLast 18846 non-null object
nameGiven 18807 non-null object
weight 17975 non-null float64
height 18041 non-null float64
bats 17655 non-null object
throws 17868 non-null object
debut 18653 non-null object
finalGame 18653 non-null object
retroID 18792 non-null object
bbrefID 18845 non-null object
dtypes: float64(8), object(16)
memory usage: 3.5+ MB
In [5]:
salary.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 25575 entries, 0 to 25574
Data columns (total 5 columns):
yearID 25575 non-null int64
teamID 25575 non-null object
lgID 25575 non-null object
playerID 25575 non-null object
salary 25575 non-null int64
dtypes: int64(2), object(3)
memory usage: 999.1+ KB
print out sample data for each table with table.head()
see additional options by pressing tab
after you type the head()
method
In [6]:
master.head()
Out[6]:
playerID
birthYear
birthMonth
birthDay
birthCountry
birthState
birthCity
deathYear
deathMonth
deathDay
deathCountry
deathState
deathCity
nameFirst
nameLast
nameGiven
weight
height
bats
throws
debut
finalGame
retroID
bbrefID
0
aardsda01
1,981.00
12.00
27.00
USA
CO
Denver
nan
nan
nan
NaN
NaN
NaN
David
Aardsma
David Allan
220.00
75.00
R
R
2004-04-06
2015-08-23
aardd001
aardsda01
1
aaronha01
1,934.00
2.00
5.00
USA
AL
Mobile
nan
nan
nan
NaN
NaN
NaN
Hank
Aaron
Henry Louis
180.00
72.00
R
R
1954-04-13
1976-10-03
aaroh101
aaronha01
2
aaronto01
1,939.00
8.00
5.00
USA
AL
Mobile
1,984.00
8.00
16.00
USA
GA
Atlanta
Tommie
Aaron
Tommie Lee
190.00
75.00
R
R
1962-04-10
1971-09-26
aarot101
aaronto01
3
aasedo01
1,954.00
9.00
8.00
USA
CA
Orange
nan
nan
nan
NaN
NaN
NaN
Don
Aase
Donald William
190.00
75.00
R
R
1977-07-26
1990-10-03
aased001
aasedo01
4
abadan01
1,972.00
8.00
25.00
USA
FL
Palm Beach
nan
nan
nan
NaN
NaN
NaN
Andy
Abad
Fausto Andres
184.00
73.00
L
L
2001-09-10
2006-04-13
abada001
abadan01
In [7]:
salary.head()
Out[7]:
yearID
teamID
lgID
playerID
salary
0
1985
ATL
NL
barkele01
870000
1
1985
ATL
NL
bedrost01
550000
2
1985
ATL
NL
benedbr01
545000
3
1985
ATL
NL
campri01
633333
4
1985
ATL
NL
ceronri01
625000
Now we join the two csv's using pd.merge
.
We want to keep all the players names in the master
data set
even if their salary is missing from the salary
data set.
We can always filter the NaN values out later
In [8]:
joined = pd.merge(left=master, right=salary, how="left")
see what columns the joined
table contains
In [9]:
joined.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 39455 entries, 0 to 39454
Data columns (total 28 columns):
playerID 39455 non-null object
birthYear 39312 non-null float64
birthMonth 39140 non-null float64
birthDay 38991 non-null float64
birthCountry 39382 non-null object
birthState 37738 non-null object
birthCity 39228 non-null object
deathYear 9669 non-null float64
deathMonth 9668 non-null float64
deathDay 9667 non-null float64
deathCountry 9662 non-null object
deathState 9603 non-null object
deathCity 9658 non-null object
nameFirst 39416 non-null object
nameLast 39455 non-null object
nameGiven 39416 non-null object
weight 38584 non-null float64
height 38650 non-null float64
bats 38264 non-null object
throws 38477 non-null object
debut 39262 non-null object
finalGame 39262 non-null object
retroID 39401 non-null object
bbrefID 39454 non-null object
yearID 25567 non-null float64
teamID 25567 non-null object
lgID 25567 non-null object
salary 25567 non-null float64
dtypes: float64(10), object(18)
memory usage: 8.7+ MB
check if all the players have a salary assigned. The easiest way is to deduct the length of the joined
table from the master
table
In [10]:
len(master) - len(joined)
Out[10]:
-20609
Something went wrong. There are now more players in the joined
data set than in the master
data set.
Some entries probably got duplicated
Let's check if we have duplicate playerIDs
by using .value_counts()
In [11]:
joined["playerID"].value_counts()
Out[11]:
moyerja01 25
vizquom01 24
glavito02 23
thomeji01 22
bondsba01 22
griffke02 22
rodrial01 21
sheffga01 21
francjo01 21
gordoto01 21
johnsra05 21
maddugr01 21
clemero02 21
smoltjo01 21
rhodear01 20
rogerke01 20
jonesch06 20
hawkila01 20
rodriiv01 20
oliveda02 20
wellsda01 20
schilcu01 19
finlest01 19
riverma01 19
ramirma02 19
giambja01 19
biggicr01 19
surhobj01 19
santibe01 19
larkiba01 19
..
becanbu01 1
ryalma01 1
warneja02 1
hardgpa01 1
peoplji01 1
skinnca01 1
sundrst01 1
lizra01 1
thackmo01 1
lyonsal01 1
currela01 1
leesmch01 1
ripplji01 1
hogueca01 1
sammocl01 1
heiseri01 1
arrigge01 1
sayji01 1
thomawa01 1
hogansh01 1
kibbiho01 1
bottoji01 1
kelihmi01 1
eckhaox01 1
hestela01 1
matteal01 1
smythha01 1
dashiwa01 1
johnsji01 1
herbefr01 1
Name: playerID, Length: 18846, dtype: int64
Yep, we do.
Let's filter out an arbitrary player to see why there is duplication
In [12]:
joined[joined["playerID"] == "moyerja01"]
Out[12]:
playerID
birthYear
birthMonth
birthDay
birthCountry
birthState
birthCity
deathYear
deathMonth
deathDay
deathCountry
deathState
deathCity
nameFirst
nameLast
nameGiven
weight
height
bats
throws
debut
finalGame
retroID
bbrefID
yearID
teamID
lgID
salary
24836
moyerja01
1,962.00
11.00
18.00
USA
PA
Sellersville
nan
nan
nan
NaN
NaN
NaN
Jamie
Moyer
Jamie
170.00
72.00
L
L
1986-06-16
2012-05-27
moyej001
moyerja01
1,986.00
CHN
NL
60,000.00
24837
moyerja01
1,962.00
11.00
18.00
USA
PA
Sellersville
nan
nan
nan
NaN
NaN
NaN
Jamie
Moyer
Jamie
170.00
72.00
L
L
1986-06-16
2012-05-27
moyej001
moyerja01
1,987.00
CHN
NL
70,000.00
24838
moyerja01
1,962.00
11.00
18.00
USA
PA
Sellersville
nan
nan
nan
NaN
NaN
NaN
Jamie
Moyer
Jamie
170.00
72.00
L
L
1986-06-16
2012-05-27
moyej001
moyerja01
1,988.00
CHN
NL
142,500.00
24839
moyerja01
1,962.00
11.00
18.00
USA
PA
Sellersville
nan
nan
nan
NaN
NaN
NaN
Jamie
Moyer
Jamie
170.00
72.00
L
L
1986-06-16
2012-05-27
moyej001
moyerja01
1,989.00
TEX
AL
205,000.00
24840
moyerja01
1,962.00
11.00
18.00
USA
PA
Sellersville
nan
nan
nan
NaN
NaN
NaN
Jamie
Moyer
Jamie
170.00
72.00
L
L
1986-06-16
2012-05-27
moyej001
moyerja01
1,990.00
TEX
AL
340,000.00
24841
moyerja01
1,962.00
11.00
18.00
USA
PA
Sellersville
nan
nan
nan
NaN
NaN
NaN
Jamie
Moyer
Jamie
170.00
72.00
L
L
1986-06-16
2012-05-27
moyej001
moyerja01
1,991.00
SLN
NL
200,000.00
24842
moyerja01
1,962.00
11.00
18.00
USA
PA
Sellersville
nan
nan
nan
NaN
NaN
NaN
Jamie
Moyer
Jamie
170.00
72.00
L
L
1986-06-16
2012-05-27
moyej001
moyerja01
1,993.00
BAL
AL
200,000.00
24843
moyerja01
1,962.00
11.00
18.00
USA
PA
Sellersville
nan
nan
nan
NaN
NaN
NaN
Jamie
Moyer
Jamie
170.00
72.00
L
L
1986-06-16
2012-05-27
moyej001
moyerja01
1,994.00
BAL
AL
725,000.00
24844
moyerja01
1,962.00
11.00
18.00
USA
PA
Sellersville
nan
nan
nan
NaN
NaN
NaN
Jamie
Moyer
Jamie
170.00
72.00
L
L
1986-06-16
2012-05-27
moyej001
moyerja01
1,995.00
BAL
AL
1,100,000.00
24845
moyerja01
1,962.00
11.00
18.00
USA
PA
Sellersville
nan
nan
nan
NaN
NaN
NaN
Jamie
Moyer
Jamie
170.00
72.00
L
L
1986-06-16
2012-05-27
moyej001
moyerja01
1,996.00
BOS
AL
825,000.00
24846
moyerja01
1,962.00
11.00
18.00
USA
PA
Sellersville
nan
nan
nan
NaN
NaN
NaN
Jamie
Moyer
Jamie
170.00
72.00
L
L
1986-06-16
2012-05-27
moyej001
moyerja01
1,997.00
SEA
AL
1,700,000.00
24847
moyerja01
1,962.00
11.00
18.00
USA
PA
Sellersville
nan
nan
nan
NaN
NaN
NaN
Jamie
Moyer
Jamie
170.00
72.00
L
L
1986-06-16
2012-05-27
moyej001
moyerja01
1,998.00
SEA
AL
2,000,000.00
24848
moyerja01
1,962.00
11.00
18.00
USA
PA
Sellersville
nan
nan
nan
NaN
NaN
NaN
Jamie
Moyer
Jamie
170.00
72.00
L
L
1986-06-16
2012-05-27
moyej001
moyerja01
1,999.00
SEA
AL
2,300,000.00
24849
moyerja01
1,962.00
11.00
18.00
USA
PA
Sellersville
nan
nan
nan
NaN
NaN
NaN
Jamie
Moyer
Jamie
170.00
72.00
L
L
1986-06-16
2012-05-27
moyej001
moyerja01
2,000.00
SEA
AL
6,000,000.00
24850
moyerja01
1,962.00
11.00
18.00
USA
PA
Sellersville
nan
nan
nan
NaN
NaN
NaN
Jamie
Moyer
Jamie
170.00
72.00
L
L
1986-06-16
2012-05-27
moyej001
moyerja01
2,001.00
SEA
AL
6,500,000.00
24851
moyerja01
1,962.00
11.00
18.00
USA
PA
Sellersville
nan
nan
nan
NaN
NaN
NaN
Jamie
Moyer
Jamie
170.00
72.00
L
L
1986-06-16
2012-05-27
moyej001
moyerja01
2,002.00
SEA
AL
6,500,000.00
24852
moyerja01
1,962.00
11.00
18.00
USA
PA
Sellersville
nan
nan
nan
NaN
NaN
NaN
Jamie
Moyer
Jamie
170.00
72.00
L
L
1986-06-16
2012-05-27
moyej001
moyerja01
2,003.00
SEA
AL
6,500,000.00
24853
moyerja01
1,962.00
11.00
18.00
USA
PA
Sellersville
nan
nan
nan
NaN
NaN
NaN
Jamie
Moyer
Jamie
170.00
72.00
L
L
1986-06-16
2012-05-27
moyej001
moyerja01
2,004.00
SEA
AL
7,000,000.00
24854
moyerja01
1,962.00
11.00
18.00
USA
PA
Sellersville
nan
nan
nan
NaN
NaN
NaN
Jamie
Moyer
Jamie
170.00
72.00
L
L
1986-06-16
2012-05-27
moyej001
moyerja01
2,005.00
SEA
AL
8,000,000.00
24855
moyerja01
1,962.00
11.00
18.00
USA
PA
Sellersville
nan
nan
nan
NaN
NaN
NaN
Jamie
Moyer
Jamie
170.00
72.00
L
L
1986-06-16
2012-05-27
moyej001
moyerja01
2,006.00
SEA
AL
5,500,000.00
24856
moyerja01
1,962.00
11.00
18.00
USA
PA
Sellersville
nan
nan
nan
NaN
NaN
NaN
Jamie
Moyer
Jamie
170.00
72.00
L
L
1986-06-16
2012-05-27
moyej001
moyerja01
2,007.00
PHI
NL
6,500,000.00
24857
moyerja01
1,962.00
11.00
18.00
USA
PA
Sellersville
nan
nan
nan
NaN
NaN
NaN
Jamie
Moyer
Jamie
170.00
72.00
L
L
1986-06-16
2012-05-27
moyej001
moyerja01
2,008.00
PHI
NL
6,000,000.00
24858
moyerja01
1,962.00
11.00
18.00
USA
PA
Sellersville
nan
nan
nan
NaN
NaN
NaN
Jamie
Moyer
Jamie
170.00
72.00
L
L
1986-06-16
2012-05-27
moyej001
moyerja01
2,009.00
PHI
NL
6,500,000.00
24859
moyerja01
1,962.00
11.00
18.00
USA
PA
Sellersville
nan
nan
nan
NaN
NaN
NaN
Jamie
Moyer
Jamie
170.00
72.00
L
L
1986-06-16
2012-05-27
moyej001
moyerja01
2,010.00
PHI
NL
8,000,000.00
24860
moyerja01
1,962.00
11.00
18.00
USA
PA
Sellersville
nan
nan
nan
NaN
NaN
NaN
Jamie
Moyer
Jamie
170.00
72.00
L
L
1986-06-16
2012-05-27
moyej001
moyerja01
2,012.00
COL
NL
1,100,000.00
As we can see, there are now salaries in the dataset for each year of the players carreer.
We only want to have the most recent salary though.
We therefore need to 'deduplicate' the data set.
But first, let's make sure we get the newest year. We can do this by sorting the data on the newest entry
In [15]:
joined = joined.sort_values(["playerID","yearID"])
Now we deduplicate
In [16]:
deduplicated = joined.drop_duplicates("playerID", keep="last")
And let's do the check again
In [17]:
len(master) - len(deduplicated)
Out[17]:
0
Now we van get into the interesting part: analysis!
In [18]:
deduplicated["salary"].describe()
Out[18]:
count 4,958.00
mean 1,692,477.94
std 3,243,005.10
min 0.00
25% 300,000.00
50% 507,500.00
75% 1,300,000.00
max 32,571,000.00
Name: salary, dtype: float64
In [19]:
max_salary = deduplicated["salary"].max()
In [20]:
deduplicated[deduplicated["salary"] == max_salary]
Out[20]:
playerID
birthYear
birthMonth
birthDay
birthCountry
birthState
birthCity
deathYear
deathMonth
deathDay
deathCountry
deathState
deathCity
nameFirst
nameLast
nameGiven
weight
height
bats
throws
debut
finalGame
retroID
bbrefID
yearID
teamID
lgID
salary
18735
kershcl01
1,988.00
3.00
19.00
USA
TX
Dallas
nan
nan
nan
NaN
NaN
NaN
Clayton
Kershaw
Clayton Edward
225.00
76.00
L
L
2008-05-25
2015-10-04
kersc001
kershcl01
2,015.00
LAN
NL
32,571,000.00
Draw a histogram.
In [21]:
deduplicated.hist("salary")
Out[21]:
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7f2bc15defd0>]],
dtype=object)
We can do the same with the column yearID
to see how recent our data is.
We have 30 years in our data set, so we need to do some minor tweaking
In [22]:
deduplicated.hist("yearID", bins=30)
Out[22]:
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7f2b938c25c0>]],
dtype=object)
calculate the 90 percentile cutoff
In [26]:
top_10_p = deduplicated["salary"].quantile(q=0.9)
top_10_p
Out[26]:
4500000.0
filter out players that make more money than the cutoff
In [27]:
best_paid = deduplicated[deduplicated["salary"] >= top_10_p]
best_paid
Out[27]:
playerID
birthYear
birthMonth
birthDay
birthCountry
birthState
birthCity
deathYear
deathMonth
deathDay
deathCountry
deathState
deathCity
nameFirst
nameLast
nameGiven
weight
height
bats
throws
debut
finalGame
retroID
bbrefID
yearID
teamID
lgID
salary
91
abreubo01
1,974.00
3.00
11.00
Venezuela
Aragua
Maracay
nan
nan
nan
NaN
NaN
NaN
Bobby
Abreu
Bob Kelly
220.00
72.00
L
R
1996-09-01
2014-09-28
abreb001
abreubo01
2,012.00
LAA
AL
9,000,000.00
94
abreujo02
1,987.00
1.00
29.00
Cuba
Cienfuegos
Cienfuegos
nan
nan
nan
NaN
NaN
NaN
Jose
Abreu
Jose Dariel
255.00
75.00
R
R
2014-03-31
2015-10-03
abrej003
abreujo02
2,015.00
CHA
AL
8,666,000.00
181
adamsmi03
1,978.00
7.00
29.00
USA
TX
Corpus Christi
nan
nan
nan
NaN
NaN
NaN
Mike
Adams
Jon Michael
210.00
77.00
R
R
2004-05-18
2014-09-18
adamm001
adamsmi03
2,014.00
PHI
NL
7,000,000.00
237
affelje01
1,979.00
6.00
6.00
USA
AZ
Phoenix
nan
nan
nan
NaN
NaN
NaN
Jeremy
Affeldt
Jeremy David
225.00
76.00
L
L
2002-04-06
2015-10-04
affej001
affelje01
2,015.00
SFN
NL
6,000,000.00
395
alfoned01
1,973.00
11.00
8.00
Venezuela
Miranda
Santa Teresa del Tuy
nan
nan
nan
NaN
NaN
NaN
Edgardo
Alfonzo
Edgardo Antonio
210.00
71.00
R
R
1995-04-26
2006-06-11
alfoe001
alfoned01
2,006.00
LAA
AL
8,000,000.00
554
aloumo01
1,966.00
7.00
3.00
USA
GA
Atlanta
nan
nan
nan
NaN
NaN
NaN
Moises
Alou
Moises Rojas
185.00
75.00
R
R
1990-07-26
2008-06-10
aloum001
aloumo01
2,008.00
NYN
NL
7,500,000.00
592
alvarpe01
1,987.00
2.00
6.00
D.R.
Distrito Nacional
Santo Domingo
nan
nan
nan
NaN
NaN
NaN
Pedro
Alvarez
Pedro Manuel
250.00
75.00
L
R
2010-06-16
2015-10-04
alvap001
alvarpe01
2,015.00
PIT
NL
5,750,000.00
695
anderbr04
1,988.00
2.00
1.00
USA
TX
Midland
nan
nan
nan
NaN
NaN
NaN
Brett
Anderson
Brett Franklin
240.00
75.00
L
L
2009-04-10
2015-10-01
andeb004
anderbr04
2,015.00
LAN
NL
10,000,000.00
820
andruel01
1,988.00
8.00
26.00
Venezuela
Aragua
Maracay
nan
nan
nan
NaN
NaN
NaN
Elvis
Andrus
Elvis Augusto
200.00
72.00
R
R
2009-04-06
2015-10-04
andre001
andruel01
2,015.00
TEX
AL
15,000,000.00
1015
arroybr01
1,977.00
2.00
24.00
USA
FL
Key West
nan
nan
nan
NaN
NaN
NaN
Bronson
Arroyo
Bronson Anthony
185.00
75.00
R
R
2000-06-12
2014-06-15
arrob001
arroybr01
2,014.00
ARI
NL
9,500,000.00
1049
ashbyan01
1,967.00
7.00
11.00
USA
MO
Kansas City
nan
nan
nan
NaN
NaN
NaN
Andy
Ashby
Andrew Jason
180.00
73.00
R
R
1991-06-10
2004-09-14
ashba002
ashbyan01
2,003.00
LAN
NL
8,500,000.00
1198
avilaal01
1,987.00
1.00
29.00
USA
FL
Hialeah
nan
nan
nan
NaN
NaN
NaN
Alex
Avila
Alexander Thomas
210.00
71.00
L
R
2009-08-06
2015-10-03
avila001
avilaal01
2,015.00
DET
AL
5,400,000.00
1244
aybarer01
1,984.00
1.00
14.00
D.R.
Peravia
Bani
nan
nan
nan
NaN
NaN
NaN
Erick
Aybar
Erick Johan
180.00
70.00
B
R
2006-05-16
2015-10-04
aybae001
aybarer01
2,015.00
LAA
AL
8,500,000.00
1342
bagweje01
1,968.00
5.00
27.00
USA
MA
Boston
nan
nan
nan
NaN
NaN
NaN
Jeff
Bagwell
Jeffrey Robert
195.00
72.00
R
R
1991-04-08
2005-10-02
bagwj001
bagweje01
2,006.00
HOU
NL
19,369,019.00
1368
baileho02
1,986.00
5.00
3.00
USA
TX
La Grange
nan
nan
nan
NaN
NaN
NaN
Homer
Bailey
David Dewitt
225.00
76.00
R
R
2007-06-08
2015-04-23
bailh001
baileho02
2,015.00
CIN
NL
10,000,000.00
1465
bakersc02
1,981.00
9.00
19.00
USA
LA
Shreveport
nan
nan
nan
NaN
NaN
NaN
Scott
Baker
Timothy Scott
215.00
76.00
R
R
2005-05-07
2015-05-02
bakes002
bakersc02
2,013.00
CHN
NL
5,500,000.00
1998
bautijo02
1,980.00
10.00
19.00
D.R.
Distrito Nacional
Santo Domingo
nan
nan
nan
NaN
NaN
NaN
Jose
Bautista
Jose Antonio
205.00
72.00
R
R
2004-04-04
2015-10-04
bautj002
bautijo02
2,015.00
TOR
AL
14,000,000.00
2099
beckejo02
1,980.00
5.00
15.00
USA
TX
Spring
nan
nan
nan
NaN
NaN
NaN
Josh
Beckett
Joshua Patrick
230.00
77.00
R
R
2001-09-04
2014-08-03
beckj002
beckejo02
2,014.00
LAN
NL
15,750,000.00
2134
beckro01
1,968.00
8.00
3.00
USA
CA
Burbank
2,007.00
6.00
23.00
USA
AZ
Phoenix
Rod
Beck
Rodney Roy
215.00
73.00
R
R
1991-05-06
2004-08-14
beckr001
beckro01
2,001.00
BOS
AL
4,500,000.00
2147
bedarer01
1,979.00
3.00
5.00
CAN
ON
Navan
nan
nan
nan
NaN
NaN
NaN
Erik
Bedard
Erik Joseph
195.00
73.00
L
L
2002-04-17
2014-07-12
bedae001
bedarer01
2,012.00
PIT
NL
4,500,000.00
2210
belchti01
1,961.00
10.00
19.00
USA
OH
Mount Gilead
nan
nan
nan
NaN
NaN
NaN
Tim
Belcher
Timothy Wayne
210.00
75.00
R
R
1987-09-06
2000-09-30
belct001
belchti01
2,000.00
ANA
AL
4,600,000.00
2264
bellda01
1,972.00
9.00
14.00
USA
OH
Cincinnati
nan
nan
nan
NaN
NaN
NaN
David
Bell
David Michael
170.00
70.00
R
R
1995-05-03
2006-10-01
belld002
bellda01
2,006.00
PHI
NL
4,700,000.00
2274
bellde01
1,968.00
12.00
11.00
USA
FL
Tampa
nan
nan
nan
NaN
NaN
NaN
Derek
Bell
Derek Nathaniel
200.00
74.00
R
R
1991-06-28
2001-07-03
belld001
bellde01
2,001.00
PIT
NL
5,000,000.00
2289
belleal01
1,966.00
8.00
25.00
USA
LA
Shreveport
nan
nan
nan
NaN
NaN
NaN
Albert
Belle
Albert Jojuan
190.00
73.00
R
R
1989-07-15
2000-10-01
bellj002
belleal01
2,003.00
BAL
AL
13,000,000.00
2314
bellhe01
1,977.00
9.00
29.00
USA
CA
Oceanside
nan
nan
nan
NaN
NaN
NaN
Heath
Bell
Heath Justin
235.00
75.00
R
R
2004-08-24
2014-05-03
bellh001
bellhe01
2,014.00
TBA
AL
9,000,000.00
2410
beltrad01
1,979.00
4.00
7.00
D.R.
Distrito Nacional
Santo Domingo
nan
nan
nan
NaN
NaN
NaN
Adrian
Beltre
Adrian
220.00
71.00
R
R
1998-06-24
2015-10-04
belta001
beltrad01
2,015.00
TEX
AL
16,000,000.00
2427
beltrca01
1,977.00
4.00
24.00
P.R.
NaN
Manati
nan
nan
nan
NaN
NaN
NaN
Carlos
Beltran
Carlos Ivan
215.00
73.00
B
R
1998-09-14
2015-10-04
beltc001
beltrca01
2,015.00
NYA
AL
15,000,000.00
2476
benesan01
1,967.00
8.00
20.00
USA
IN
Evansville
nan
nan
nan
NaN
NaN
NaN
Andy
Benes
Andrew Charles
235.00
78.00
R
R
1989-08-11
2002-09-29
benea001
benesan01
2,002.00
SLN
NL
6,367,542.00
2497
benitar01
1,972.00
11.00
3.00
D.R.
San Pedro de Macoris
Ramon Santana
nan
nan
nan
NaN
NaN
NaN
Armando
Benitez
Armando German
180.00
76.00
R
R
1994-07-28
2008-06-06
benia001
benitar01
2,007.00
SFN
NL
9,866,219.00
2552
benoijo01
1,977.00
7.00
26.00
D.R.
Santiago
Santiago
nan
nan
nan
NaN
NaN
NaN
Joaquin
Benoit
Joaquin Antonio
250.00
76.00
R
R
2001-08-08
2015-10-01
benoj001
benoijo01
2,015.00
SDN
NL
8,000,000.00
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
37182
warddu01
1,964.00
5.00
28.00
USA
NM
Park View
nan
nan
nan
NaN
NaN
NaN
Duane
Ward
Roy Duane
185.00
76.00
R
R
1986-04-12
1995-06-22
wardd001
warddu01
1,995.00
TOR
AL
4,750,000.00
37258
washbja01
1,974.00
8.00
13.00
USA
WI
La Crosse
nan
nan
nan
NaN
NaN
NaN
Jarrod
Washburn
Jarrod Michael
190.00
73.00
L
L
1998-06-02
2009-09-15
washj001
washbja01
2,009.00
SEA
AL
9,850,000.00
37382
weaveje02
1,982.00
10.00
4.00
USA
CA
Northridge
nan
nan
nan
NaN
NaN
NaN
Jered
Weaver
Jered David
210.00
79.00
R
R
2006-05-27
2015-10-02
weavj003
weaveje02
2,015.00
LAA
AL
18,000,000.00
37613
wellsve01
1,978.00
12.00
8.00
USA
LA
Shreveport
nan
nan
nan
NaN
NaN
NaN
Vernon
Wells
Vernon M.
230.00
73.00
R
R
1999-08-30
2013-09-29
wellv001
wellsve01
2,013.00
NYA
AL
24,642,857.00
37663
werthja01
1,979.00
5.00
20.00
USA
IL
Springfield
nan
nan
nan
NaN
NaN
NaN
Jayson
Werth
Jayson Richard Gowan
240.00
77.00
R
R
2002-09-01
2015-10-04
wertj001
werthja01
2,015.00
WAS
NL
21,000,000.00
37682
westbja01
1,977.00
9.00
29.00
USA
GA
Athens
nan
nan
nan
NaN
NaN
NaN
Jake
Westbrook
Jacob Cauthen
210.00
75.00
R
R
2000-06-17
2013-09-29
westj001
westbja01
2,013.00
SLN
NL
8,750,000.00
37718
wettejo01
1,966.00
8.00
21.00
USA
CA
San Mateo
nan
nan
nan
NaN
NaN
NaN
John
Wetteland
John Karl
195.00
74.00
R
R
1989-05-31
2000-09-20
wettj001
wettejo01
2,000.00
TEX
AL
6,500,000.00
37810
whitede03
1,962.00
12.00
29.00
Jamaica
NaN
Kingston
nan
nan
nan
NaN
NaN
NaN
Devon
White
Devon Markes
170.00
73.00
B
R
1985-09-02
2001-10-05
whitd001
whitede03
2,001.00
MIL
NL
5,000,000.00
37968
wickmbo01
1,969.00
2.00
6.00
USA
WI
Green Bay
nan
nan
nan
NaN
NaN
NaN
Bob
Wickman
Robert Joe
207.00
73.00
R
R
1992-08-24
2007-09-30
wickb001
wickmbo01
2,007.00
ATL
NL
6,500,000.00
37993
wietema01
1,986.00
5.00
21.00
USA
SC
Goose Creek
nan
nan
nan
NaN
NaN
NaN
Matt
Wieters
Matthew Richard
230.00
77.00
B
R
2009-05-29
2015-10-04
wietm001
wietema01
2,015.00
BAL
AL
8,300,000.00
38144
willido03
1,982.00
1.00
12.00
USA
CA
Oakland
nan
nan
nan
NaN
NaN
NaN
Dontrelle
Willis
Dontrelle Wayne
230.00
74.00
L
L
2003-05-09
2011-09-27
willd003
willido03
2,010.00
DET
AL
12,000,000.00
38202
willijo03
1,979.00
2.00
17.00
USA
AL
Florence
nan
nan
nan
NaN
NaN
NaN
Josh
Willingham
Joshua David
230.00
74.00
R
R
2004-07-06
2014-09-28
willj004
willijo03
2,014.00
MIN
AL
7,000,000.00
38232
willima04
1,965.00
11.00
28.00
USA
CA
Bishop
nan
nan
nan
NaN
NaN
NaN
Matt
Williams
Matthew Derrick
205.00
74.00
R
R
1987-04-11
2003-05-31
willm003
willima04
2,003.00
ARI
NL
10,000,000.00
38315
williwo02
1,966.00
8.00
19.00
USA
TX
Houston
nan
nan
nan
NaN
NaN
NaN
Woody
Williams
Gregory Scott
180.00
72.00
R
R
1993-05-14
2007-09-22
willw001
williwo02
2,007.00
HOU
NL
6,000,000.00
38348
wilsobr01
1,982.00
3.00
16.00
USA
NH
Londonderry
nan
nan
nan
NaN
NaN
NaN
Brian
Wilson
Brian Patrick
205.00
73.00
R
R
2006-04-23
2014-09-27
wilsb001
wilsobr01
2,014.00
LAN
NL
10,000,000.00
38360
wilsocj01
1,980.00
11.00
18.00
USA
CA
Newport Beach
nan
nan
nan
NaN
NaN
NaN
C. J.
Wilson
Christopher John
210.00
73.00
L
L
2005-06-10
2015-07-28
wilsc004
wilsocj01
2,015.00
LAA
AL
18,000,000.00
38707
wolfra02
1,976.00
8.00
22.00
USA
CA
Canoga Park
nan
nan
nan
NaN
NaN
NaN
Randy
Wolf
Randall Christopher
205.00
72.00
L
L
1999-06-11
2015-10-04
wolfr001
wolfra02
2,012.00
MIL
NL
9,500,000.00
38812
woodtr01
1,987.00
2.00
6.00
USA
AR
Little Rock
nan
nan
nan
NaN
NaN
NaN
Travis
Wood
Travis A.
175.00
71.00
R
L
2010-07-01
2015-10-04
woodt004
woodtr01
2,015.00
CHN
NL
5,686,000.00
38907
wrighda03
1,982.00
12.00
20.00
USA
VA
Norfolk
nan
nan
nan
NaN
NaN
NaN
David
Wright
David Allen
205.00
72.00
R
R
2004-07-21
2015-10-04
wrigd002
wrighda03
2,015.00
NYN
NL
20,000,000.00
38939
wrighja02
1,975.00
12.00
29.00
USA
CA
Anaheim
nan
nan
nan
NaN
NaN
NaN
Jaret
Wright
Jaret Samuel
220.00
74.00
R
R
1997-06-24
2007-04-29
wrigj002
wrighja02
2,007.00
BAL
AL
7,000,000.00
39092
youklke01
1,979.00
3.00
15.00
USA
OH
Cincinnati
nan
nan
nan
NaN
NaN
NaN
Kevin
Youkilis
Kevin Edmund
220.00
73.00
R
R
2004-05-15
2013-06-13
youkk001
youklke01
2,013.00
NYA
AL
12,000,000.00
39164
youngdm01
1,973.00
10.00
11.00
USA
MS
Vicksburg
nan
nan
nan
NaN
NaN
NaN
Dmitri
Young
Dmitri Dell
295.00
74.00
B
R
1996-08-29
2008-07-11
yound001
youngdm01
2,009.00
WAS
NL
5,000,000.00
39219
youngke01
1,969.00
6.00
16.00
USA
MI
Alpena
nan
nan
nan
NaN
NaN
NaN
Kevin
Young
Kevin Stacey
210.00
75.00
R
R
1992-07-12
2003-06-27
younk001
youngke01
2,003.00
PIT
NL
6,625,000.00
39245
youngmi02
1,976.00
10.00
19.00
USA
CA
Covina
nan
nan
nan
NaN
NaN
NaN
Michael
Young
Michael Brian
200.00
73.00
R
R
2000-09-29
2013-09-29
younm003
youngmi02
2,013.00
PHI
NL
18,374,975.00
39294
zambrca01
1,981.00
6.00
1.00
Venezuela
Carabobo
Puerto Cabello
nan
nan
nan
NaN
NaN
NaN
Carlos
Zambrano
Carlos Alberto
275.00
76.00
B
R
2001-08-20
2012-09-21
zambc001
zambrca01
2,012.00
MIA
NL
19,000,000.00
39364
zieglbr01
1,979.00
10.00
10.00
USA
KS
Pratt
nan
nan
nan
NaN
NaN
NaN
Brad
Ziegler
Brad Gregory
220.00
76.00
R
R
2008-05-31
2015-10-04
ziegb001
zieglbr01
2,015.00
ARI
NL
5,000,000.00
39387
zimmejo02
1,986.00
5.00
23.00
USA
WI
Auburndale
nan
nan
nan
NaN
NaN
NaN
Jordan
Zimmermann
Jordan M.
225.00
74.00
R
R
2009-04-20
2015-09-30
zimmj003
zimmejo02
2,015.00
WAS
NL
16,500,000.00
39397
zimmery01
1,984.00
9.00
28.00
USA
NC
Washington
nan
nan
nan
NaN
NaN
NaN
Ryan
Zimmerman
Ryan Wallace
220.00
75.00
R
R
2005-09-01
2015-09-07
zimmr001
zimmery01
2,015.00
WAS
NL
14,000,000.00
39421
zitoba01
1,978.00
5.00
13.00
USA
NV
Las Vegas
nan
nan
nan
NaN
NaN
NaN
Barry
Zito
Barry William
205.00
74.00
L
L
2000-07-22
2015-09-30
zitob001
zitoba01
2,013.00
SFN
NL
20,000,000.00
39432
zobribe01
1,981.00
5.00
26.00
USA
IL
Eureka
nan
nan
nan
NaN
NaN
NaN
Ben
Zobrist
Benjamin Thomas
210.00
75.00
B
R
2006-08-01
2015-10-04
zobrb001
zobribe01
2,015.00
OAK
AL
7,500,000.00
505 rows × 28 columns
use the nlargest
to see the top 10 best paid players
In [28]:
best_paid_top_10 = best_paid.nlargest(10, "salary")
best_paid_top_10
Out[28]:
playerID
birthYear
birthMonth
birthDay
birthCountry
birthState
birthCity
deathYear
deathMonth
deathDay
deathCountry
deathState
deathCity
nameFirst
nameLast
nameGiven
weight
height
bats
throws
debut
finalGame
retroID
bbrefID
yearID
teamID
lgID
salary
18735
kershcl01
1,988.00
3.00
19.00
USA
TX
Dallas
nan
nan
nan
NaN
NaN
NaN
Clayton
Kershaw
Clayton Edward
225.00
76.00
L
L
2008-05-25
2015-10-04
kersc001
kershcl01
2,015.00
LAN
NL
32,571,000.00
36569
verlaju01
1,983.00
2.00
20.00
USA
VA
Manakin Sabot
nan
nan
nan
NaN
NaN
NaN
Justin
Verlander
Justin Brooks
225.00
77.00
R
R
2005-07-04
2015-10-03
verlj001
verlaju01
2,015.00
DET
AL
28,000,000.00
13538
greinza01
1,983.00
10.00
21.00
USA
FL
Orlando
nan
nan
nan
NaN
NaN
NaN
Zack
Greinke
Donald Zachary
195.00
72.00
R
R
2004-05-22
2015-10-03
greiz001
greinza01
2,015.00
LAN
NL
25,000,000.00
16479
howarry01
1,979.00
11.00
19.00
USA
MO
St. Louis
nan
nan
nan
NaN
NaN
NaN
Ryan
Howard
Ryan James
250.00
76.00
L
L
2004-09-01
2015-09-14
howar001
howarry01
2,015.00
PHI
NL
25,000,000.00
20087
leecl02
1,978.00
8.00
30.00
USA
AR
Benton
nan
nan
nan
NaN
NaN
NaN
Cliff
Lee
Clifton Phifer
205.00
75.00
L
L
2002-09-15
2014-07-31
lee-c003
leecl02
2,014.00
PHI
NL
25,000,000.00
15555
hernafe02
1,986.00
4.00
8.00
Venezuela
Carabobo
Valencia
nan
nan
nan
NaN
NaN
NaN
Felix
Hernandez
Felix Abraham
225.00
75.00
R
R
2005-08-04
2015-09-26
hernf002
hernafe02
2,015.00
SEA
AL
24,857,000.00
37613
wellsve01
1,978.00
12.00
8.00
USA
LA
Shreveport
nan
nan
nan
NaN
NaN
NaN
Vernon
Wells
Vernon M.
230.00
73.00
R
R
1999-08-30
2013-09-29
wellv001
wellsve01
2,013.00
NYA
AL
24,642,857.00
5384
canoro01
1,982.00
10.00
22.00
D.R.
San Pedro de Macoris
San Pedro de Macoris
nan
nan
nan
NaN
NaN
NaN
Robinson
Cano
Robinson Jose
210.00
72.00
L
R
2005-05-03
2015-10-04
canor001
canoro01
2,015.00
SEA
AL
24,000,000.00
10823
fieldpr01
1,984.00
5.00
9.00
USA
CA
Ontario
nan
nan
nan
NaN
NaN
NaN
Prince
Fielder
Prince Semien
275.00
71.00
L
R
2005-06-13
2015-10-04
fielp001
fieldpr01
2,015.00
TEX
AL
24,000,000.00
28702
pujolal01
1,980.00
1.00
16.00
D.R.
Distrito Nacional
Santo Domingo
nan
nan
nan
NaN
NaN
NaN
Albert
Pujols
Jose Alberto
230.00
75.00
R
R
2001-04-02
2015-10-04
pujoa001
pujolal01
2,015.00
LAA
AL
24,000,000.00
draw a chart
In [31]:
best_paid_top_10.plot(kind="barh", x="nameLast", y="salary")
Out[31]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f2b934218d0>
save the data
In [30]:
best_paid.to_csv('highest-paid.csv', index=False)
Content source: ireapps/pycar
Similar notebooks: