Python for Data Analysis Lightning Tutorials is a series of tutorials in Data Analysis, Statistics, and Graphics using Python. The Pandas Cookbook series of tutorials provides recipes for common tasks and moves on to more advanced topics in statistics and time series analysis.
Created by Alfred Essa, Dec 22nd, 2013
Note: IPython Notebook and Data files can be found at my Github Site: http://github/alfredessa
In [32]:
import pandas as pd
In [33]:
mlb = pd.read_csv('data/mlbsalaries.csv')
In [35]:
mlb.tail()
Out[35]:
In [39]:
mlb.Year.value_counts()
Out[39]:
In [ ]:
In [40]:
yr2010 = mlb[mlb.Year==2010]
In [41]:
yr2010 = yr2010.set_index('Player')
In [42]:
yr2010.head()
Out[42]:
In [ ]:
In [44]:
# sort row labels
yr2010.sort_index().head()
Out[44]:
In [45]:
# sort column labels
yr2010.sort_index(axis=1).head()
Out[45]:
In [46]:
# sort column values using order field; note: the order field returns a series
yr2010.Salary.order(ascending=False).head()
Out[46]:
In [47]:
# sort column values using sort_index method
sorted_yr2010 = yr2010.sort_index(ascending=False, by = ['Salary'])
In [48]:
sorted_yr2010.head(20)
Out[48]:
In [50]:
yr2010.sort_index(ascending=[False,True], by =['Salary', 'Team']).head(20)
Out[50]:
In [51]:
# Top 10 highest paid players
top10 = yr2010.Salary.order(ascending=False).head()
In [53]:
type(top10)
Out[53]:
In [54]:
#plot highest paid
plt.figure()
top10.plot(label='Salaries')
xticks(rotation='vertical')
plt.legend()
Out[54]:
In [ ]: