Python for Data Analysis Lightning Tutorials is a series of tutorials in Data Analysis, Statistics, and Graphics using Python. The Pandas Cookbook series of tutorials provides recipes for common tasks and moves on to more advanced topics in statistics and time series analysis.
Created by Alfred Essa, June 15th, 2014
Note: IPython Notebook and Data files can be found at my Github Site: http://github/alfredessa
In this tutorial we learn to do some basic descriptive statistics in Pandas
In [1]:
# Load pandas and numpy libraries
import pandas as pd
import numpy as np
In [2]:
# Set default option for Pandas
pd.set_option('display.max_rows',20)
In [3]:
# Plot inline in notebook
%pylab inline
In [4]:
# Read dataset
auto = pd.read_csv('data/auto.csv')
In [5]:
# Show first lines of data
auto.head()
Out[5]:
In [6]:
# describe provides basic statistics
auto.describe()
Out[6]:
In [7]:
# choose an individual column
auto.mpg.describe()
Out[7]:
In [8]:
# variant notation
auto['mpg'].describe()
Out[8]:
In [10]:
# median is not part of describe but is available
auto.mpg.std()
Out[10]:
In [12]:
# plot a histogram
auto.price.hist()
Out[12]:
In [13]:
#box plot
auto.boxplot(column='price')
Out[13]:
In [14]:
titanic = pd.read_csv('data/titanic.csv')
In [15]:
titanic.head()
Out[15]:
In [16]:
titanic.describe()
Out[16]:
In [18]:
titanic.Age.hist(bins=20)
Out[18]:
In [19]:
grouped = titanic.groupby('Sex')
In [20]:
grouped.Age.median()
Out[20]:
In [21]:
grouped.Age.describe()
Out[21]:
In [24]:
grouped.get_group('male').Age.hist(bins=20)
grouped.get_group('female').Age.hist(bins=20)
Out[24]:
In [25]:
grouped.boxplot(column='Age')
Out[25]:
In [ ]: