Data management with Pandas

An overview of some of the data management tools in Python's Pandas package. Includes:

Selecting variables
Selecting observations
Indexing
Groupby
Stacking
Doubly indexed dataframes
Combining dataframes (concat)
Merging dataframes

This notebook was written by Dave Backus for the NYU Stern course Data Bootcamp.



In [ ]:

    
import pandas as pd
%matplotlib inline

Reminders

Dataframes
Index and columns

Selecting variables

Datasets

We take these examples from the data input chapter:

Penn World Table
World Economic Outlook
UN Population Data

All of them come in an unfriendly form; our goal is to fix them. Here we extract small subsets to work with so that we can follow all the steps.

Penn World Table

This one comes with countries stacked on top of each others.



In [ ]:

    
data = {'countrycode': ['CHN', 'CHN', 'CHN', 'FRA', 'FRA', 'FRA', 'USA', 'USA', 'USA'],
 'pop': [1124.7939240000001, 1246.8400649999999, 1318.1701519999999, 58.183173999999994,
         60.764324999999999, 64.731126000000003, 253.33909699999998, 282.49630999999999,
         310.38394799999998],
 'rgdpe': [2611027.0, 4951485.0, 11106452.0, 1293837.0, 1752570.125, 2031723.25,
           7964788.5, 11494606.0, 13151344.0],
 'year': [1990, 2000, 2010, 1990, 2000, 2010, 1990, 2000, 2010]}
pwt = pd.DataFrame(data)
pwt



In [ ]:



In [ ]:

    
### UN Population Data



In [ ]:



In [ ]: