Python for Data Analysis Lightning Tutorials is a series of tutorials in Data Analysis, Statistics, and Graphics using Python. The Pandas Cookbook series of tutorials provides recipes for common tasks and moves on to more advanced topics in statistics and time series analysis.
Created by Alfred Essa, May 26th, 2014
Note: IPython Notebook and Data files can be found at my Github Site: http://github/alfredessa
In this tutorial we learn to use "queries" and "filters" in Pandas.
In [1]:
# Load pandas and numpy libraries
import pandas as pd
import numpy as np
In [2]:
# Set default option for Pandas
pd.set_option('display.max_rows',10)
In [3]:
# Plot inline in notebook
%pylab inline
In [4]:
# Read dataset
auto = pd.read_csv('data/auto.csv')
In [5]:
# Show first lines of data
auto.head()
Out[5]:
In [6]:
# search in column "forum" for value == 1
auto.foreign==1
Out[6]:
In [7]:
# set variable "mask" to the search filter
mask = auto.foreign==1
In [8]:
mask
Out[8]:
In [9]:
# apply the filter to the dataset
auto[mask]
Out[9]:
In [10]:
# create a new dataframe (subset of original)
foreign = auto[mask]
In [11]:
foreign
Out[11]:
In [12]:
# apply inverse filter
domestic = auto[np.invert(mask)]
In [13]:
domestic
Out[13]:
In [17]:
mask2 = ((auto.mpg>20) & (auto.price<5000))
In [18]:
myselection = auto[mask2]
In [19]:
myselection
Out[19]:
In [20]:
# read data
crime = pd.read_csv('data/crime.csv')
In [21]:
# verify data
crime.head()
Out[21]:
In [30]:
# set mask
mask3 = ((crime.State=='California') & (crime.Crime=='Murder and nonnegligent Manslaughter'))
In [31]:
# apply mask or filter
cal_murder = crime[mask3]
In [32]:
cal_murder
Out[32]:
In [33]:
# plot data
cal_murder.plot(x='Year', y='Count')
Out[33]:
In [ ]: