Data management with Pandas

An overview of some of the data management tools in Python's Pandas package. Includes:

Selecting variables
Selecting observations
Indexing
Groupby
Stacking
Doubly indexed dataframes
Combining dataframes (concat)
Merging dataframes

This notebook was written by Dave Backus for the NYU Stern course Data Bootcamp.



In [1]:

    
import pandas as pd
%matplotlib inline

Reminders

Dataframes
Index and columns

Selecting variables

Datasets

We take these examples from the data input chapter:

Penn World Table
World Economic Outlook
UN Population Data

All of them come in an unfriendly form; our goal is to fix them. Here we extract small subsets to work with so that we can follow all the steps.

Penn World Table

This one comes with countries stacked on top of each others.



In [37]:

    
data = {'countrycode': ['CHN', 'CHN', 'CHN', 'FRA', 'FRA', 'FRA', 'USA', 'USA', 'USA'],
 'pop': [1124.7939240000001, 1246.8400649999999, 1318.1701519999999, 58.183173999999994,
         60.764324999999999, 64.731126000000003, 253.33909699999998, 282.49630999999999,
         310.38394799999998],
 'rgdpe': [2611027.0, 4951485.0, 11106452.0, 1293837.0, 1752570.125, 2031723.25,
           7964788.5, 11494606.0, 13151344.0],
 'year': [1990, 2000, 2010, 1990, 2000, 2010, 1990, 2000, 2010]}
pwt = pd.DataFrame(data)
pwt









    Out[37]:






  
    
      
      countrycode
      pop
      rgdpe
      year
    
  
  
    
      0
      CHN
      1124.793924
      2611027.000
      1990
    
    
      1
      CHN
      1246.840065
      4951485.000
      2000
    
    
      2
      CHN
      1318.170152
      11106452.000
      2010
    
    
      3
      FRA
      58.183174
      1293837.000
      1990
    
    
      4
      FRA
      60.764325
      1752570.125
      2000
    
    
      5
      FRA
      64.731126
      2031723.250
      2010
    
    
      6
      USA
      253.339097
      7964788.500
      1990
    
    
      7
      USA
      282.496310
      11494606.000
      2000
    
    
      8
      USA
      310.383948
      13151344.000
      2010



In [ ]:



In [ ]:

    
### UN Population Data



In [ ]:



In [ ]:

	countrycode	pop	rgdpe	year
0	CHN	1124.793924	2611027.000	1990
1	CHN	1246.840065	4951485.000	2000
2	CHN	1318.170152	11106452.000	2010
3	FRA	58.183174	1293837.000	1990
4	FRA	60.764325	1752570.125	2000
5	FRA	64.731126	2031723.250	2010
6	USA	253.339097	7964788.500	1990
7	USA	282.496310	11494606.000	2000
8	USA	310.383948	13151344.000	2010