Analyzing data with Pandas

First a little setup. Importing the pandas library as pd


In [1]:
import pandas as pd

Set some helpful display options. Uncomment the boilerplate in this cell.


In [2]:
%matplotlib inline
pd.set_option("max_columns", 150)
pd.set_option('max_colwidth',40)
pd.options.display.float_format = '{:,.2f}'.format

open and read in the Master.csv and Salaries.csv tables in the data/2017/ directory


In [3]:
master = pd.read_csv('../project3/data/2017/Master.csv') # File with player details
salary = pd.read_csv('../project3/data/2017/Salaries.csv') #File with baseball players' salaries

check to see what type each object is with print(table_name). You can also use the .info() method to explore the data's structure.


In [4]:
master.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18846 entries, 0 to 18845
Data columns (total 24 columns):
playerID        18846 non-null object
birthYear       18703 non-null float64
birthMonth      18531 non-null float64
birthDay        18382 non-null float64
birthCountry    18773 non-null object
birthState      18220 non-null object
birthCity       18647 non-null object
deathYear       9336 non-null float64
deathMonth      9335 non-null float64
deathDay        9334 non-null float64
deathCountry    9329 non-null object
deathState      9277 non-null object
deathCity       9325 non-null object
nameFirst       18807 non-null object
nameLast        18846 non-null object
nameGiven       18807 non-null object
weight          17975 non-null float64
height          18041 non-null float64
bats            17655 non-null object
throws          17868 non-null object
debut           18653 non-null object
finalGame       18653 non-null object
retroID         18792 non-null object
bbrefID         18845 non-null object
dtypes: float64(8), object(16)
memory usage: 3.5+ MB

In [5]:
salary.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 25575 entries, 0 to 25574
Data columns (total 5 columns):
yearID      25575 non-null int64
teamID      25575 non-null object
lgID        25575 non-null object
playerID    25575 non-null object
salary      25575 non-null int64
dtypes: int64(2), object(3)
memory usage: 999.1+ KB

print out sample data for each table with table.head()
see additional options by pressing tab after you type the head() method


In [6]:
master.head()


Out[6]:
playerID birthYear birthMonth birthDay birthCountry birthState birthCity deathYear deathMonth deathDay deathCountry deathState deathCity nameFirst nameLast nameGiven weight height bats throws debut finalGame retroID bbrefID
0 aardsda01 1,981.00 12.00 27.00 USA CO Denver nan nan nan NaN NaN NaN David Aardsma David Allan 220.00 75.00 R R 2004-04-06 2015-08-23 aardd001 aardsda01
1 aaronha01 1,934.00 2.00 5.00 USA AL Mobile nan nan nan NaN NaN NaN Hank Aaron Henry Louis 180.00 72.00 R R 1954-04-13 1976-10-03 aaroh101 aaronha01
2 aaronto01 1,939.00 8.00 5.00 USA AL Mobile 1,984.00 8.00 16.00 USA GA Atlanta Tommie Aaron Tommie Lee 190.00 75.00 R R 1962-04-10 1971-09-26 aarot101 aaronto01
3 aasedo01 1,954.00 9.00 8.00 USA CA Orange nan nan nan NaN NaN NaN Don Aase Donald William 190.00 75.00 R R 1977-07-26 1990-10-03 aased001 aasedo01
4 abadan01 1,972.00 8.00 25.00 USA FL Palm Beach nan nan nan NaN NaN NaN Andy Abad Fausto Andres 184.00 73.00 L L 2001-09-10 2006-04-13 abada001 abadan01

In [7]:
salary.head()


Out[7]:
yearID teamID lgID playerID salary
0 1985 ATL NL barkele01 870000
1 1985 ATL NL bedrost01 550000
2 1985 ATL NL benedbr01 545000
3 1985 ATL NL campri01 633333
4 1985 ATL NL ceronri01 625000

Now we join the two csv's using pd.merge.
We want to keep all the players names in the master data set
even if their salary is missing from the salary data set.
We can always filter the NaN values out later


In [8]:
joined = pd.merge(left=master, right=salary, how="left")

see what columns the joined table contains


In [9]:
joined.info()


<class 'pandas.core.frame.DataFrame'>
Int64Index: 39455 entries, 0 to 39454
Data columns (total 28 columns):
playerID        39455 non-null object
birthYear       39312 non-null float64
birthMonth      39140 non-null float64
birthDay        38991 non-null float64
birthCountry    39382 non-null object
birthState      37738 non-null object
birthCity       39228 non-null object
deathYear       9669 non-null float64
deathMonth      9668 non-null float64
deathDay        9667 non-null float64
deathCountry    9662 non-null object
deathState      9603 non-null object
deathCity       9658 non-null object
nameFirst       39416 non-null object
nameLast        39455 non-null object
nameGiven       39416 non-null object
weight          38584 non-null float64
height          38650 non-null float64
bats            38264 non-null object
throws          38477 non-null object
debut           39262 non-null object
finalGame       39262 non-null object
retroID         39401 non-null object
bbrefID         39454 non-null object
yearID          25567 non-null float64
teamID          25567 non-null object
lgID            25567 non-null object
salary          25567 non-null float64
dtypes: float64(10), object(18)
memory usage: 8.7+ MB

check if all the players have a salary assigned. The easiest way is to deduct the length of the joined table from the master table


In [10]:
len(master) - len(joined)


Out[10]:
-20609

Something went wrong. There are now more players in the joined data set than in the master data set.
Some entries probably got duplicated
Let's check if we have duplicate playerIDs by using .value_counts()


In [11]:
joined["playerID"].value_counts()


Out[11]:
moyerja01    25
vizquom01    24
glavito02    23
thomeji01    22
bondsba01    22
griffke02    22
rodrial01    21
sheffga01    21
francjo01    21
gordoto01    21
johnsra05    21
maddugr01    21
clemero02    21
smoltjo01    21
rhodear01    20
rogerke01    20
jonesch06    20
hawkila01    20
rodriiv01    20
oliveda02    20
wellsda01    20
schilcu01    19
finlest01    19
riverma01    19
ramirma02    19
giambja01    19
biggicr01    19
surhobj01    19
santibe01    19
larkiba01    19
             ..
becanbu01     1
ryalma01      1
warneja02     1
hardgpa01     1
peoplji01     1
skinnca01     1
sundrst01     1
lizra01       1
thackmo01     1
lyonsal01     1
currela01     1
leesmch01     1
ripplji01     1
hogueca01     1
sammocl01     1
heiseri01     1
arrigge01     1
sayji01       1
thomawa01     1
hogansh01     1
kibbiho01     1
bottoji01     1
kelihmi01     1
eckhaox01     1
hestela01     1
matteal01     1
smythha01     1
dashiwa01     1
johnsji01     1
herbefr01     1
Name: playerID, Length: 18846, dtype: int64

Yep, we do.
Let's filter out an arbitrary player to see why there is duplication


In [12]:
joined[joined["playerID"] == "moyerja01"]


Out[12]:
playerID birthYear birthMonth birthDay birthCountry birthState birthCity deathYear deathMonth deathDay deathCountry deathState deathCity nameFirst nameLast nameGiven weight height bats throws debut finalGame retroID bbrefID yearID teamID lgID salary
24836 moyerja01 1,962.00 11.00 18.00 USA PA Sellersville nan nan nan NaN NaN NaN Jamie Moyer Jamie 170.00 72.00 L L 1986-06-16 2012-05-27 moyej001 moyerja01 1,986.00 CHN NL 60,000.00
24837 moyerja01 1,962.00 11.00 18.00 USA PA Sellersville nan nan nan NaN NaN NaN Jamie Moyer Jamie 170.00 72.00 L L 1986-06-16 2012-05-27 moyej001 moyerja01 1,987.00 CHN NL 70,000.00
24838 moyerja01 1,962.00 11.00 18.00 USA PA Sellersville nan nan nan NaN NaN NaN Jamie Moyer Jamie 170.00 72.00 L L 1986-06-16 2012-05-27 moyej001 moyerja01 1,988.00 CHN NL 142,500.00
24839 moyerja01 1,962.00 11.00 18.00 USA PA Sellersville nan nan nan NaN NaN NaN Jamie Moyer Jamie 170.00 72.00 L L 1986-06-16 2012-05-27 moyej001 moyerja01 1,989.00 TEX AL 205,000.00
24840 moyerja01 1,962.00 11.00 18.00 USA PA Sellersville nan nan nan NaN NaN NaN Jamie Moyer Jamie 170.00 72.00 L L 1986-06-16 2012-05-27 moyej001 moyerja01 1,990.00 TEX AL 340,000.00
24841 moyerja01 1,962.00 11.00 18.00 USA PA Sellersville nan nan nan NaN NaN NaN Jamie Moyer Jamie 170.00 72.00 L L 1986-06-16 2012-05-27 moyej001 moyerja01 1,991.00 SLN NL 200,000.00
24842 moyerja01 1,962.00 11.00 18.00 USA PA Sellersville nan nan nan NaN NaN NaN Jamie Moyer Jamie 170.00 72.00 L L 1986-06-16 2012-05-27 moyej001 moyerja01 1,993.00 BAL AL 200,000.00
24843 moyerja01 1,962.00 11.00 18.00 USA PA Sellersville nan nan nan NaN NaN NaN Jamie Moyer Jamie 170.00 72.00 L L 1986-06-16 2012-05-27 moyej001 moyerja01 1,994.00 BAL AL 725,000.00
24844 moyerja01 1,962.00 11.00 18.00 USA PA Sellersville nan nan nan NaN NaN NaN Jamie Moyer Jamie 170.00 72.00 L L 1986-06-16 2012-05-27 moyej001 moyerja01 1,995.00 BAL AL 1,100,000.00
24845 moyerja01 1,962.00 11.00 18.00 USA PA Sellersville nan nan nan NaN NaN NaN Jamie Moyer Jamie 170.00 72.00 L L 1986-06-16 2012-05-27 moyej001 moyerja01 1,996.00 BOS AL 825,000.00
24846 moyerja01 1,962.00 11.00 18.00 USA PA Sellersville nan nan nan NaN NaN NaN Jamie Moyer Jamie 170.00 72.00 L L 1986-06-16 2012-05-27 moyej001 moyerja01 1,997.00 SEA AL 1,700,000.00
24847 moyerja01 1,962.00 11.00 18.00 USA PA Sellersville nan nan nan NaN NaN NaN Jamie Moyer Jamie 170.00 72.00 L L 1986-06-16 2012-05-27 moyej001 moyerja01 1,998.00 SEA AL 2,000,000.00
24848 moyerja01 1,962.00 11.00 18.00 USA PA Sellersville nan nan nan NaN NaN NaN Jamie Moyer Jamie 170.00 72.00 L L 1986-06-16 2012-05-27 moyej001 moyerja01 1,999.00 SEA AL 2,300,000.00
24849 moyerja01 1,962.00 11.00 18.00 USA PA Sellersville nan nan nan NaN NaN NaN Jamie Moyer Jamie 170.00 72.00 L L 1986-06-16 2012-05-27 moyej001 moyerja01 2,000.00 SEA AL 6,000,000.00
24850 moyerja01 1,962.00 11.00 18.00 USA PA Sellersville nan nan nan NaN NaN NaN Jamie Moyer Jamie 170.00 72.00 L L 1986-06-16 2012-05-27 moyej001 moyerja01 2,001.00 SEA AL 6,500,000.00
24851 moyerja01 1,962.00 11.00 18.00 USA PA Sellersville nan nan nan NaN NaN NaN Jamie Moyer Jamie 170.00 72.00 L L 1986-06-16 2012-05-27 moyej001 moyerja01 2,002.00 SEA AL 6,500,000.00
24852 moyerja01 1,962.00 11.00 18.00 USA PA Sellersville nan nan nan NaN NaN NaN Jamie Moyer Jamie 170.00 72.00 L L 1986-06-16 2012-05-27 moyej001 moyerja01 2,003.00 SEA AL 6,500,000.00
24853 moyerja01 1,962.00 11.00 18.00 USA PA Sellersville nan nan nan NaN NaN NaN Jamie Moyer Jamie 170.00 72.00 L L 1986-06-16 2012-05-27 moyej001 moyerja01 2,004.00 SEA AL 7,000,000.00
24854 moyerja01 1,962.00 11.00 18.00 USA PA Sellersville nan nan nan NaN NaN NaN Jamie Moyer Jamie 170.00 72.00 L L 1986-06-16 2012-05-27 moyej001 moyerja01 2,005.00 SEA AL 8,000,000.00
24855 moyerja01 1,962.00 11.00 18.00 USA PA Sellersville nan nan nan NaN NaN NaN Jamie Moyer Jamie 170.00 72.00 L L 1986-06-16 2012-05-27 moyej001 moyerja01 2,006.00 SEA AL 5,500,000.00
24856 moyerja01 1,962.00 11.00 18.00 USA PA Sellersville nan nan nan NaN NaN NaN Jamie Moyer Jamie 170.00 72.00 L L 1986-06-16 2012-05-27 moyej001 moyerja01 2,007.00 PHI NL 6,500,000.00
24857 moyerja01 1,962.00 11.00 18.00 USA PA Sellersville nan nan nan NaN NaN NaN Jamie Moyer Jamie 170.00 72.00 L L 1986-06-16 2012-05-27 moyej001 moyerja01 2,008.00 PHI NL 6,000,000.00
24858 moyerja01 1,962.00 11.00 18.00 USA PA Sellersville nan nan nan NaN NaN NaN Jamie Moyer Jamie 170.00 72.00 L L 1986-06-16 2012-05-27 moyej001 moyerja01 2,009.00 PHI NL 6,500,000.00
24859 moyerja01 1,962.00 11.00 18.00 USA PA Sellersville nan nan nan NaN NaN NaN Jamie Moyer Jamie 170.00 72.00 L L 1986-06-16 2012-05-27 moyej001 moyerja01 2,010.00 PHI NL 8,000,000.00
24860 moyerja01 1,962.00 11.00 18.00 USA PA Sellersville nan nan nan NaN NaN NaN Jamie Moyer Jamie 170.00 72.00 L L 1986-06-16 2012-05-27 moyej001 moyerja01 2,012.00 COL NL 1,100,000.00

As we can see, there are now salaries in the dataset for each year of the players carreer.
We only want to have the most recent salary though.
We therefore need to 'deduplicate' the data set.

But first, let's make sure we get the newest year. We can do this by sorting the data on the newest entry


In [15]:
joined = joined.sort_values(["playerID","yearID"])

Now we deduplicate


In [16]:
deduplicated = joined.drop_duplicates("playerID", keep="last")

And let's do the check again


In [17]:
len(master) - len(deduplicated)


Out[17]:
0

Now we van get into the interesting part: analysis!

What is the average (mean, median, max, min) salary?


In [18]:
deduplicated["salary"].describe()


Out[18]:
count        4,958.00
mean     1,692,477.94
std      3,243,005.10
min              0.00
25%        300,000.00
50%        507,500.00
75%      1,300,000.00
max     32,571,000.00
Name: salary, dtype: float64

Who makes the most money?


In [19]:
max_salary = deduplicated["salary"].max()

In [20]:
deduplicated[deduplicated["salary"] == max_salary]


Out[20]:
playerID birthYear birthMonth birthDay birthCountry birthState birthCity deathYear deathMonth deathDay deathCountry deathState deathCity nameFirst nameLast nameGiven weight height bats throws debut finalGame retroID bbrefID yearID teamID lgID salary
18735 kershcl01 1,988.00 3.00 19.00 USA TX Dallas nan nan nan NaN NaN NaN Clayton Kershaw Clayton Edward 225.00 76.00 L L 2008-05-25 2015-10-04 kersc001 kershcl01 2,015.00 LAN NL 32,571,000.00

What are the most common baseball players salaries?

Draw a histogram.


In [21]:
deduplicated.hist("salary")


Out[21]:
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7f2bc15defd0>]],
      dtype=object)

We can do the same with the column yearID to see how recent our data is.
We have 30 years in our data set, so we need to do some minor tweaking


In [22]:
deduplicated.hist("yearID", bins=30)


Out[22]:
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7f2b938c25c0>]],
      dtype=object)

Who are the top 10% highest-paid players?

calculate the 90 percentile cutoff


In [26]:
top_10_p = deduplicated["salary"].quantile(q=0.9)
top_10_p


Out[26]:
4500000.0

filter out players that make more money than the cutoff


In [27]:
best_paid = deduplicated[deduplicated["salary"] >= top_10_p]
best_paid


Out[27]:
playerID birthYear birthMonth birthDay birthCountry birthState birthCity deathYear deathMonth deathDay deathCountry deathState deathCity nameFirst nameLast nameGiven weight height bats throws debut finalGame retroID bbrefID yearID teamID lgID salary
91 abreubo01 1,974.00 3.00 11.00 Venezuela Aragua Maracay nan nan nan NaN NaN NaN Bobby Abreu Bob Kelly 220.00 72.00 L R 1996-09-01 2014-09-28 abreb001 abreubo01 2,012.00 LAA AL 9,000,000.00
94 abreujo02 1,987.00 1.00 29.00 Cuba Cienfuegos Cienfuegos nan nan nan NaN NaN NaN Jose Abreu Jose Dariel 255.00 75.00 R R 2014-03-31 2015-10-03 abrej003 abreujo02 2,015.00 CHA AL 8,666,000.00
181 adamsmi03 1,978.00 7.00 29.00 USA TX Corpus Christi nan nan nan NaN NaN NaN Mike Adams Jon Michael 210.00 77.00 R R 2004-05-18 2014-09-18 adamm001 adamsmi03 2,014.00 PHI NL 7,000,000.00
237 affelje01 1,979.00 6.00 6.00 USA AZ Phoenix nan nan nan NaN NaN NaN Jeremy Affeldt Jeremy David 225.00 76.00 L L 2002-04-06 2015-10-04 affej001 affelje01 2,015.00 SFN NL 6,000,000.00
395 alfoned01 1,973.00 11.00 8.00 Venezuela Miranda Santa Teresa del Tuy nan nan nan NaN NaN NaN Edgardo Alfonzo Edgardo Antonio 210.00 71.00 R R 1995-04-26 2006-06-11 alfoe001 alfoned01 2,006.00 LAA AL 8,000,000.00
554 aloumo01 1,966.00 7.00 3.00 USA GA Atlanta nan nan nan NaN NaN NaN Moises Alou Moises Rojas 185.00 75.00 R R 1990-07-26 2008-06-10 aloum001 aloumo01 2,008.00 NYN NL 7,500,000.00
592 alvarpe01 1,987.00 2.00 6.00 D.R. Distrito Nacional Santo Domingo nan nan nan NaN NaN NaN Pedro Alvarez Pedro Manuel 250.00 75.00 L R 2010-06-16 2015-10-04 alvap001 alvarpe01 2,015.00 PIT NL 5,750,000.00
695 anderbr04 1,988.00 2.00 1.00 USA TX Midland nan nan nan NaN NaN NaN Brett Anderson Brett Franklin 240.00 75.00 L L 2009-04-10 2015-10-01 andeb004 anderbr04 2,015.00 LAN NL 10,000,000.00
820 andruel01 1,988.00 8.00 26.00 Venezuela Aragua Maracay nan nan nan NaN NaN NaN Elvis Andrus Elvis Augusto 200.00 72.00 R R 2009-04-06 2015-10-04 andre001 andruel01 2,015.00 TEX AL 15,000,000.00
1015 arroybr01 1,977.00 2.00 24.00 USA FL Key West nan nan nan NaN NaN NaN Bronson Arroyo Bronson Anthony 185.00 75.00 R R 2000-06-12 2014-06-15 arrob001 arroybr01 2,014.00 ARI NL 9,500,000.00
1049 ashbyan01 1,967.00 7.00 11.00 USA MO Kansas City nan nan nan NaN NaN NaN Andy Ashby Andrew Jason 180.00 73.00 R R 1991-06-10 2004-09-14 ashba002 ashbyan01 2,003.00 LAN NL 8,500,000.00
1198 avilaal01 1,987.00 1.00 29.00 USA FL Hialeah nan nan nan NaN NaN NaN Alex Avila Alexander Thomas 210.00 71.00 L R 2009-08-06 2015-10-03 avila001 avilaal01 2,015.00 DET AL 5,400,000.00
1244 aybarer01 1,984.00 1.00 14.00 D.R. Peravia Bani nan nan nan NaN NaN NaN Erick Aybar Erick Johan 180.00 70.00 B R 2006-05-16 2015-10-04 aybae001 aybarer01 2,015.00 LAA AL 8,500,000.00
1342 bagweje01 1,968.00 5.00 27.00 USA MA Boston nan nan nan NaN NaN NaN Jeff Bagwell Jeffrey Robert 195.00 72.00 R R 1991-04-08 2005-10-02 bagwj001 bagweje01 2,006.00 HOU NL 19,369,019.00
1368 baileho02 1,986.00 5.00 3.00 USA TX La Grange nan nan nan NaN NaN NaN Homer Bailey David Dewitt 225.00 76.00 R R 2007-06-08 2015-04-23 bailh001 baileho02 2,015.00 CIN NL 10,000,000.00
1465 bakersc02 1,981.00 9.00 19.00 USA LA Shreveport nan nan nan NaN NaN NaN Scott Baker Timothy Scott 215.00 76.00 R R 2005-05-07 2015-05-02 bakes002 bakersc02 2,013.00 CHN NL 5,500,000.00
1998 bautijo02 1,980.00 10.00 19.00 D.R. Distrito Nacional Santo Domingo nan nan nan NaN NaN NaN Jose Bautista Jose Antonio 205.00 72.00 R R 2004-04-04 2015-10-04 bautj002 bautijo02 2,015.00 TOR AL 14,000,000.00
2099 beckejo02 1,980.00 5.00 15.00 USA TX Spring nan nan nan NaN NaN NaN Josh Beckett Joshua Patrick 230.00 77.00 R R 2001-09-04 2014-08-03 beckj002 beckejo02 2,014.00 LAN NL 15,750,000.00
2134 beckro01 1,968.00 8.00 3.00 USA CA Burbank 2,007.00 6.00 23.00 USA AZ Phoenix Rod Beck Rodney Roy 215.00 73.00 R R 1991-05-06 2004-08-14 beckr001 beckro01 2,001.00 BOS AL 4,500,000.00
2147 bedarer01 1,979.00 3.00 5.00 CAN ON Navan nan nan nan NaN NaN NaN Erik Bedard Erik Joseph 195.00 73.00 L L 2002-04-17 2014-07-12 bedae001 bedarer01 2,012.00 PIT NL 4,500,000.00
2210 belchti01 1,961.00 10.00 19.00 USA OH Mount Gilead nan nan nan NaN NaN NaN Tim Belcher Timothy Wayne 210.00 75.00 R R 1987-09-06 2000-09-30 belct001 belchti01 2,000.00 ANA AL 4,600,000.00
2264 bellda01 1,972.00 9.00 14.00 USA OH Cincinnati nan nan nan NaN NaN NaN David Bell David Michael 170.00 70.00 R R 1995-05-03 2006-10-01 belld002 bellda01 2,006.00 PHI NL 4,700,000.00
2274 bellde01 1,968.00 12.00 11.00 USA FL Tampa nan nan nan NaN NaN NaN Derek Bell Derek Nathaniel 200.00 74.00 R R 1991-06-28 2001-07-03 belld001 bellde01 2,001.00 PIT NL 5,000,000.00
2289 belleal01 1,966.00 8.00 25.00 USA LA Shreveport nan nan nan NaN NaN NaN Albert Belle Albert Jojuan 190.00 73.00 R R 1989-07-15 2000-10-01 bellj002 belleal01 2,003.00 BAL AL 13,000,000.00
2314 bellhe01 1,977.00 9.00 29.00 USA CA Oceanside nan nan nan NaN NaN NaN Heath Bell Heath Justin 235.00 75.00 R R 2004-08-24 2014-05-03 bellh001 bellhe01 2,014.00 TBA AL 9,000,000.00
2410 beltrad01 1,979.00 4.00 7.00 D.R. Distrito Nacional Santo Domingo nan nan nan NaN NaN NaN Adrian Beltre Adrian 220.00 71.00 R R 1998-06-24 2015-10-04 belta001 beltrad01 2,015.00 TEX AL 16,000,000.00
2427 beltrca01 1,977.00 4.00 24.00 P.R. NaN Manati nan nan nan NaN NaN NaN Carlos Beltran Carlos Ivan 215.00 73.00 B R 1998-09-14 2015-10-04 beltc001 beltrca01 2,015.00 NYA AL 15,000,000.00
2476 benesan01 1,967.00 8.00 20.00 USA IN Evansville nan nan nan NaN NaN NaN Andy Benes Andrew Charles 235.00 78.00 R R 1989-08-11 2002-09-29 benea001 benesan01 2,002.00 SLN NL 6,367,542.00
2497 benitar01 1,972.00 11.00 3.00 D.R. San Pedro de Macoris Ramon Santana nan nan nan NaN NaN NaN Armando Benitez Armando German 180.00 76.00 R R 1994-07-28 2008-06-06 benia001 benitar01 2,007.00 SFN NL 9,866,219.00
2552 benoijo01 1,977.00 7.00 26.00 D.R. Santiago Santiago nan nan nan NaN NaN NaN Joaquin Benoit Joaquin Antonio 250.00 76.00 R R 2001-08-08 2015-10-01 benoj001 benoijo01 2,015.00 SDN NL 8,000,000.00
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
37182 warddu01 1,964.00 5.00 28.00 USA NM Park View nan nan nan NaN NaN NaN Duane Ward Roy Duane 185.00 76.00 R R 1986-04-12 1995-06-22 wardd001 warddu01 1,995.00 TOR AL 4,750,000.00
37258 washbja01 1,974.00 8.00 13.00 USA WI La Crosse nan nan nan NaN NaN NaN Jarrod Washburn Jarrod Michael 190.00 73.00 L L 1998-06-02 2009-09-15 washj001 washbja01 2,009.00 SEA AL 9,850,000.00
37382 weaveje02 1,982.00 10.00 4.00 USA CA Northridge nan nan nan NaN NaN NaN Jered Weaver Jered David 210.00 79.00 R R 2006-05-27 2015-10-02 weavj003 weaveje02 2,015.00 LAA AL 18,000,000.00
37613 wellsve01 1,978.00 12.00 8.00 USA LA Shreveport nan nan nan NaN NaN NaN Vernon Wells Vernon M. 230.00 73.00 R R 1999-08-30 2013-09-29 wellv001 wellsve01 2,013.00 NYA AL 24,642,857.00
37663 werthja01 1,979.00 5.00 20.00 USA IL Springfield nan nan nan NaN NaN NaN Jayson Werth Jayson Richard Gowan 240.00 77.00 R R 2002-09-01 2015-10-04 wertj001 werthja01 2,015.00 WAS NL 21,000,000.00
37682 westbja01 1,977.00 9.00 29.00 USA GA Athens nan nan nan NaN NaN NaN Jake Westbrook Jacob Cauthen 210.00 75.00 R R 2000-06-17 2013-09-29 westj001 westbja01 2,013.00 SLN NL 8,750,000.00
37718 wettejo01 1,966.00 8.00 21.00 USA CA San Mateo nan nan nan NaN NaN NaN John Wetteland John Karl 195.00 74.00 R R 1989-05-31 2000-09-20 wettj001 wettejo01 2,000.00 TEX AL 6,500,000.00
37810 whitede03 1,962.00 12.00 29.00 Jamaica NaN Kingston nan nan nan NaN NaN NaN Devon White Devon Markes 170.00 73.00 B R 1985-09-02 2001-10-05 whitd001 whitede03 2,001.00 MIL NL 5,000,000.00
37968 wickmbo01 1,969.00 2.00 6.00 USA WI Green Bay nan nan nan NaN NaN NaN Bob Wickman Robert Joe 207.00 73.00 R R 1992-08-24 2007-09-30 wickb001 wickmbo01 2,007.00 ATL NL 6,500,000.00
37993 wietema01 1,986.00 5.00 21.00 USA SC Goose Creek nan nan nan NaN NaN NaN Matt Wieters Matthew Richard 230.00 77.00 B R 2009-05-29 2015-10-04 wietm001 wietema01 2,015.00 BAL AL 8,300,000.00
38144 willido03 1,982.00 1.00 12.00 USA CA Oakland nan nan nan NaN NaN NaN Dontrelle Willis Dontrelle Wayne 230.00 74.00 L L 2003-05-09 2011-09-27 willd003 willido03 2,010.00 DET AL 12,000,000.00
38202 willijo03 1,979.00 2.00 17.00 USA AL Florence nan nan nan NaN NaN NaN Josh Willingham Joshua David 230.00 74.00 R R 2004-07-06 2014-09-28 willj004 willijo03 2,014.00 MIN AL 7,000,000.00
38232 willima04 1,965.00 11.00 28.00 USA CA Bishop nan nan nan NaN NaN NaN Matt Williams Matthew Derrick 205.00 74.00 R R 1987-04-11 2003-05-31 willm003 willima04 2,003.00 ARI NL 10,000,000.00
38315 williwo02 1,966.00 8.00 19.00 USA TX Houston nan nan nan NaN NaN NaN Woody Williams Gregory Scott 180.00 72.00 R R 1993-05-14 2007-09-22 willw001 williwo02 2,007.00 HOU NL 6,000,000.00
38348 wilsobr01 1,982.00 3.00 16.00 USA NH Londonderry nan nan nan NaN NaN NaN Brian Wilson Brian Patrick 205.00 73.00 R R 2006-04-23 2014-09-27 wilsb001 wilsobr01 2,014.00 LAN NL 10,000,000.00
38360 wilsocj01 1,980.00 11.00 18.00 USA CA Newport Beach nan nan nan NaN NaN NaN C. J. Wilson Christopher John 210.00 73.00 L L 2005-06-10 2015-07-28 wilsc004 wilsocj01 2,015.00 LAA AL 18,000,000.00
38707 wolfra02 1,976.00 8.00 22.00 USA CA Canoga Park nan nan nan NaN NaN NaN Randy Wolf Randall Christopher 205.00 72.00 L L 1999-06-11 2015-10-04 wolfr001 wolfra02 2,012.00 MIL NL 9,500,000.00
38812 woodtr01 1,987.00 2.00 6.00 USA AR Little Rock nan nan nan NaN NaN NaN Travis Wood Travis A. 175.00 71.00 R L 2010-07-01 2015-10-04 woodt004 woodtr01 2,015.00 CHN NL 5,686,000.00
38907 wrighda03 1,982.00 12.00 20.00 USA VA Norfolk nan nan nan NaN NaN NaN David Wright David Allen 205.00 72.00 R R 2004-07-21 2015-10-04 wrigd002 wrighda03 2,015.00 NYN NL 20,000,000.00
38939 wrighja02 1,975.00 12.00 29.00 USA CA Anaheim nan nan nan NaN NaN NaN Jaret Wright Jaret Samuel 220.00 74.00 R R 1997-06-24 2007-04-29 wrigj002 wrighja02 2,007.00 BAL AL 7,000,000.00
39092 youklke01 1,979.00 3.00 15.00 USA OH Cincinnati nan nan nan NaN NaN NaN Kevin Youkilis Kevin Edmund 220.00 73.00 R R 2004-05-15 2013-06-13 youkk001 youklke01 2,013.00 NYA AL 12,000,000.00
39164 youngdm01 1,973.00 10.00 11.00 USA MS Vicksburg nan nan nan NaN NaN NaN Dmitri Young Dmitri Dell 295.00 74.00 B R 1996-08-29 2008-07-11 yound001 youngdm01 2,009.00 WAS NL 5,000,000.00
39219 youngke01 1,969.00 6.00 16.00 USA MI Alpena nan nan nan NaN NaN NaN Kevin Young Kevin Stacey 210.00 75.00 R R 1992-07-12 2003-06-27 younk001 youngke01 2,003.00 PIT NL 6,625,000.00
39245 youngmi02 1,976.00 10.00 19.00 USA CA Covina nan nan nan NaN NaN NaN Michael Young Michael Brian 200.00 73.00 R R 2000-09-29 2013-09-29 younm003 youngmi02 2,013.00 PHI NL 18,374,975.00
39294 zambrca01 1,981.00 6.00 1.00 Venezuela Carabobo Puerto Cabello nan nan nan NaN NaN NaN Carlos Zambrano Carlos Alberto 275.00 76.00 B R 2001-08-20 2012-09-21 zambc001 zambrca01 2,012.00 MIA NL 19,000,000.00
39364 zieglbr01 1,979.00 10.00 10.00 USA KS Pratt nan nan nan NaN NaN NaN Brad Ziegler Brad Gregory 220.00 76.00 R R 2008-05-31 2015-10-04 ziegb001 zieglbr01 2,015.00 ARI NL 5,000,000.00
39387 zimmejo02 1,986.00 5.00 23.00 USA WI Auburndale nan nan nan NaN NaN NaN Jordan Zimmermann Jordan M. 225.00 74.00 R R 2009-04-20 2015-09-30 zimmj003 zimmejo02 2,015.00 WAS NL 16,500,000.00
39397 zimmery01 1,984.00 9.00 28.00 USA NC Washington nan nan nan NaN NaN NaN Ryan Zimmerman Ryan Wallace 220.00 75.00 R R 2005-09-01 2015-09-07 zimmr001 zimmery01 2,015.00 WAS NL 14,000,000.00
39421 zitoba01 1,978.00 5.00 13.00 USA NV Las Vegas nan nan nan NaN NaN NaN Barry Zito Barry William 205.00 74.00 L L 2000-07-22 2015-09-30 zitob001 zitoba01 2,013.00 SFN NL 20,000,000.00
39432 zobribe01 1,981.00 5.00 26.00 USA IL Eureka nan nan nan NaN NaN NaN Ben Zobrist Benjamin Thomas 210.00 75.00 B R 2006-08-01 2015-10-04 zobrb001 zobribe01 2,015.00 OAK AL 7,500,000.00

505 rows × 28 columns

use the nlargest to see the top 10 best paid players


In [28]:
best_paid_top_10 = best_paid.nlargest(10, "salary")
best_paid_top_10


Out[28]:
playerID birthYear birthMonth birthDay birthCountry birthState birthCity deathYear deathMonth deathDay deathCountry deathState deathCity nameFirst nameLast nameGiven weight height bats throws debut finalGame retroID bbrefID yearID teamID lgID salary
18735 kershcl01 1,988.00 3.00 19.00 USA TX Dallas nan nan nan NaN NaN NaN Clayton Kershaw Clayton Edward 225.00 76.00 L L 2008-05-25 2015-10-04 kersc001 kershcl01 2,015.00 LAN NL 32,571,000.00
36569 verlaju01 1,983.00 2.00 20.00 USA VA Manakin Sabot nan nan nan NaN NaN NaN Justin Verlander Justin Brooks 225.00 77.00 R R 2005-07-04 2015-10-03 verlj001 verlaju01 2,015.00 DET AL 28,000,000.00
13538 greinza01 1,983.00 10.00 21.00 USA FL Orlando nan nan nan NaN NaN NaN Zack Greinke Donald Zachary 195.00 72.00 R R 2004-05-22 2015-10-03 greiz001 greinza01 2,015.00 LAN NL 25,000,000.00
16479 howarry01 1,979.00 11.00 19.00 USA MO St. Louis nan nan nan NaN NaN NaN Ryan Howard Ryan James 250.00 76.00 L L 2004-09-01 2015-09-14 howar001 howarry01 2,015.00 PHI NL 25,000,000.00
20087 leecl02 1,978.00 8.00 30.00 USA AR Benton nan nan nan NaN NaN NaN Cliff Lee Clifton Phifer 205.00 75.00 L L 2002-09-15 2014-07-31 lee-c003 leecl02 2,014.00 PHI NL 25,000,000.00
15555 hernafe02 1,986.00 4.00 8.00 Venezuela Carabobo Valencia nan nan nan NaN NaN NaN Felix Hernandez Felix Abraham 225.00 75.00 R R 2005-08-04 2015-09-26 hernf002 hernafe02 2,015.00 SEA AL 24,857,000.00
37613 wellsve01 1,978.00 12.00 8.00 USA LA Shreveport nan nan nan NaN NaN NaN Vernon Wells Vernon M. 230.00 73.00 R R 1999-08-30 2013-09-29 wellv001 wellsve01 2,013.00 NYA AL 24,642,857.00
5384 canoro01 1,982.00 10.00 22.00 D.R. San Pedro de Macoris San Pedro de Macoris nan nan nan NaN NaN NaN Robinson Cano Robinson Jose 210.00 72.00 L R 2005-05-03 2015-10-04 canor001 canoro01 2,015.00 SEA AL 24,000,000.00
10823 fieldpr01 1,984.00 5.00 9.00 USA CA Ontario nan nan nan NaN NaN NaN Prince Fielder Prince Semien 275.00 71.00 L R 2005-06-13 2015-10-04 fielp001 fieldpr01 2,015.00 TEX AL 24,000,000.00
28702 pujolal01 1,980.00 1.00 16.00 D.R. Distrito Nacional Santo Domingo nan nan nan NaN NaN NaN Albert Pujols Jose Alberto 230.00 75.00 R R 2001-04-02 2015-10-04 pujoa001 pujolal01 2,015.00 LAA AL 24,000,000.00

draw a chart


In [31]:
best_paid_top_10.plot(kind="barh", x="nameLast", y="salary")


Out[31]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f2b934218d0>

save the data


In [30]:
best_paid.to_csv('highest-paid.csv', index=False)