Python for Data Analysis Lightning Tutorials

Pandas Cookbook Series

Python for Data Analysis Lightning Tutorials is a series of tutorials in Data Analysis, Statistics, and Graphics using Python. The Pandas Cookbook series of tutorials provides recipes for common tasks and moves on to more advanced topics in statistics and time series analysis.

Created by Alfred Essa, May 26th, 2014

Note: IPython Notebook and Data files can be found at my Github Site: http://github/alfredessa

4.1b Introduction

In this tutorial we learn to use "queries" and "filters" in Pandas.

4.11 Preliminaries



In [1]:

    
# Load pandas and numpy libraries
import pandas as pd
import numpy as np



In [2]:

    
# Set default option for Pandas
pd.set_option('display.max_rows',10)



In [3]:

    
# Plot inline in notebook
%pylab inline









    



Populating the interactive namespace from numpy and matplotlib

4.12 Read Data



In [4]:

    
# Read dataset
auto = pd.read_csv('data/auto.csv')



In [5]:

    
# Show first lines of data
auto.head()









    Out[5]:






  
    
      
      make
      price
      mpg
      repairs
      weight
      length
      foreign
    
  
  
    
      0
        AMC
       4099
       22
       3
       2930
       186
       0
    
    
      1
        AMC
       4749
       17
       3
       3350
       173
       0
    
    
      2
        AMC
       3799
       22
       3
       2640
       168
       0
    
    
      3
       Audi
       9690
       17
       5
       2830
       189
       1
    
    
      4
       Audi
       6295
       23
       3
       2070
       174
       1
    
  

5 rows × 7 columns

4.13 Apply Filters



In [6]:

    
# search in column "forum" for value == 1
auto.foreign==1









    Out[6]:





0    False
1    False
2    False
...
23    True
24    True
25    True
Name: foreign, Length: 26, dtype: bool



In [7]:

    
# set variable "mask" to the search filter
mask = auto.foreign==1



In [8]:

    
mask









    Out[8]:





0    False
1    False
2    False
...
23    True
24    True
25    True
Name: foreign, Length: 26, dtype: bool



In [9]:

    
# apply the filter to the dataset
auto[mask]









    Out[9]:






  
    
      
      make
      price
      mpg
      repairs
      weight
      length
      foreign
    
  
  
    
      3 
         Audi
       9690
       17
       5
       2830
       189
       1
    
    
      4 
         Audi
       6295
       23
       3
       2070
       174
       1
    
    
      5 
          BMW
       9735
       25
       4
       2650
       177
       1
    
    
      22
       Datsun
       6229
       23
       4
       2370
       170
       1
    
    
      23
       Datsun
       4589
       35
       5
       2020
       165
       1
    
    
      24
       Datsun
       5079
       24
       4
       2280
       170
       1
    
    
      25
       Datsun
       8129
       21
       4
       2750
       184
       1
    
  

7 rows × 7 columns



In [10]:

    
# create a new dataframe (subset of original)
foreign = auto[mask]



In [11]:

    
foreign









    Out[11]:






  
    
      
      make
      price
      mpg
      repairs
      weight
      length
      foreign
    
  
  
    
      3 
         Audi
       9690
       17
       5
       2830
       189
       1
    
    
      4 
         Audi
       6295
       23
       3
       2070
       174
       1
    
    
      5 
          BMW
       9735
       25
       4
       2650
       177
       1
    
    
      22
       Datsun
       6229
       23
       4
       2370
       170
       1
    
    
      23
       Datsun
       4589
       35
       5
       2020
       165
       1
    
    
      24
       Datsun
       5079
       24
       4
       2280
       170
       1
    
    
      25
       Datsun
       8129
       21
       4
       2750
       184
       1
    
  

7 rows × 7 columns



In [12]:

    
# apply inverse filter
domestic = auto[np.invert(mask)]



In [13]:

    
domestic









    Out[13]:






  
    
      
      make
      price
      mpg
      repairs
      weight
      length
      foreign
    
  
  
    
      0 
         AMC
        4099
       22
       3
       2930
       186
       0
    
    
      1 
         AMC
        4749
       17
       3
       3350
       173
       0
    
    
      2 
         AMC
        3799
       22
       3
       2640
       168
       0
    
    
      6 
       Buick
        4816
       20
       3
       3250
       196
       0
    
    
      7 
       Buick
        7827
       15
       4
       4080
       222
       0
    
    
      8 
       Buick
        5788
       18
       3
       3670
       218
       0
    
    
      9 
       Buick
        4453
       26
       3
       2230
       170
       0
    
    
      10
       Buick
        5189
       20
       3
       3280
       200
       0
    
    
      11
       Buick
       10372
       16
       3
       3880
       207
       0
    
    
      12
       Buick
        4082
       19
       3
       3400
       200
       0
    
    
      
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
  

19 rows × 7 columns



In [17]:

    
mask2 = ((auto.mpg>20) & (auto.price<5000))



In [18]:

    
myselection = auto[mask2]



In [19]:

    
myselection









    Out[19]:






  
    
      
      make
      price
      mpg
      repairs
      weight
      length
      foreign
    
  
  
    
      0 
             AMC
       4099
       22
       3
       2930
       186
       0
    
    
      2 
             AMC
       3799
       22
       3
       2640
       168
       0
    
    
      9 
           Buick
       4453
       26
       3
       2230
       170
       0
    
    
      16
       Chevrolet
       3299
       29
       3
       2110
       163
       0
    
    
      18
       Chevrolet
       4504
       22
       3
       3180
       193
       0
    
    
      20
       Chevrolet
       3667
       24
       2
       2750
       179
       0
    
    
      23
          Datsun
       4589
       35
       5
       2020
       165
       1
    
  

7 rows × 7 columns

4.14 Practice using crime data set



In [20]:

    
# read data
crime = pd.read_csv('data/crime.csv')



In [21]:

    
# verify data
crime.head()









    Out[21]:






  
    
      
      State
      Type of Crime
      Crime
      Year
      Count
    
  
  
    
      0
       Alabama
       Violent Crime
       Murder and nonnegligent Manslaughter
       1960
       406
    
    
      1
       Alabama
       Violent Crime
       Murder and nonnegligent Manslaughter
       1961
       427
    
    
      2
       Alabama
       Violent Crime
       Murder and nonnegligent Manslaughter
       1962
       316
    
    
      3
       Alabama
       Violent Crime
       Murder and nonnegligent Manslaughter
       1963
       340
    
    
      4
       Alabama
       Violent Crime
       Murder and nonnegligent Manslaughter
       1964
       316
    
  

5 rows × 5 columns



In [30]:

    
# set mask
mask3 = ((crime.State=='California') & (crime.Crime=='Murder and nonnegligent Manslaughter'))



In [31]:

    
# apply mask or filter
cal_murder = crime[mask3]



In [32]:

    
cal_murder









    Out[32]:






  
    
      
      State
      Type of Crime
      Crime
      Year
      Count
    
  
  
    
      1288
       California
       Violent Crime
       Murder and nonnegligent Manslaughter
       1960
        616
    
    
      1289
       California
       Violent Crime
       Murder and nonnegligent Manslaughter
       1961
        605
    
    
      1290
       California
       Violent Crime
       Murder and nonnegligent Manslaughter
       1962
        657
    
    
      1291
       California
       Violent Crime
       Murder and nonnegligent Manslaughter
       1963
        673
    
    
      1292
       California
       Violent Crime
       Murder and nonnegligent Manslaughter
       1964
        740
    
    
      1293
       California
       Violent Crime
       Murder and nonnegligent Manslaughter
       1965
        880
    
    
      1294
       California
       Violent Crime
       Murder and nonnegligent Manslaughter
       1966
        868
    
    
      1295
       California
       Violent Crime
       Murder and nonnegligent Manslaughter
       1967
       1039
    
    
      1296
       California
       Violent Crime
       Murder and nonnegligent Manslaughter
       1968
       1150
    
    
      1297
       California
       Violent Crime
       Murder and nonnegligent Manslaughter
       1969
       1386
    
    
      
      ...
      ...
      ...
      ...
      ...
    
  

46 rows × 5 columns



In [33]:

    
# plot data
cal_murder.plot(x='Year', y='Count')









    Out[33]:





<matplotlib.axes.AxesSubplot at 0x11038e710>



In [ ]:

	make	price	mpg	repairs	weight	length	foreign
0	AMC	4099	22	3	2930	186	0
1	AMC	4749	17	3	3350	173	0
2	AMC	3799	22	3	2640	168	0
3	Audi	9690	17	5	2830	189	1
4	Audi	6295	23	3	2070	174	1

	State	Type of Crime	Crime	Year	Count
0	Alabama	Violent Crime	Murder and nonnegligent Manslaughter	1960	406
1	Alabama	Violent Crime	Murder and nonnegligent Manslaughter	1961	427
2	Alabama	Violent Crime	Murder and nonnegligent Manslaughter	1962	316
3	Alabama	Violent Crime	Murder and nonnegligent Manslaughter	1963	340
4	Alabama	Violent Crime	Murder and nonnegligent Manslaughter	1964	316

	State	Type of Crime	Crime	Year	Count
1288	California	Violent Crime	Murder and nonnegligent Manslaughter	1960	616
1289	California	Violent Crime	Murder and nonnegligent Manslaughter	1961	605
1290	California	Violent Crime	Murder and nonnegligent Manslaughter	1962	657
1291	California	Violent Crime	Murder and nonnegligent Manslaughter	1963	673
1292	California	Violent Crime	Murder and nonnegligent Manslaughter	1964	740
1293	California	Violent Crime	Murder and nonnegligent Manslaughter	1965	880
1294	California	Violent Crime	Murder and nonnegligent Manslaughter	1966	868
1295	California	Violent Crime	Murder and nonnegligent Manslaughter	1967	1039
1296	California	Violent Crime	Murder and nonnegligent Manslaughter	1968	1150
1297	California	Violent Crime	Murder and nonnegligent Manslaughter	1969	1386
	...	...	...	...	...