Advanced Pandas: Combining data

Sometimes we need to combine data from two or more dataframes. That's colloquially known as a merge or a join. There are lots of ways to do this. We do a couple but supply references to more at the end.

Along the way we take an extended detour to review methods for downloading and unzipping compressed files. The tools we use here have a broad range of other applications, including web scraping.

Outline:

MovieLens data. A collection of movies and individual ratings.
Automate file download. Use the requests package to get a zipped file, then other tools to unzip it and read in the contents.
Merge movie names and ratings. Merge information from two dataframes with Pandas' merge function.
UN population data. We merge ("concatenate") estimates from the past with projections of the future.

Note: requires internet access to run.

This IPython notebook was created by Dave Backus, Chase Coleman, Brian LeBlanc, and Spencer Lyon for the NYU Stern course Data Bootcamp.

Preliminaries

Import packages, etc.



In [1]:

    
%matplotlib inline 

import pandas as pd             # data package
import matplotlib.pyplot as plt # graphics 
import datetime as dt           # date tools, used to note current date  

# these are new 
import os                       # operating system tools (check files)
import requests, io             # internet and input tools  
import zipfile as zf            # zip file tools 
import shutil                   # file management tools

MovieLens data

The data comes as a zip file that contains several csv's. We get the details from the README inside. (It's written in Markdown, so it's easier to read if we use a browser to format it. Or we could cut and paste into a Markdown cell in an IPython notebook.)

The file descriptions are:

ratings.csv: each line is an individual film rating with the rater and movie id's and the rating. Order: userId, movieId, rating, timestamp.
tags.csv: each line is a tag on a specific film. Order: userId, movieId, tag, timestamp.
movies.csv: each line is a movie name, its id, and its genre. Order: movieId, title, genres. Multiple genres are separated by "pipes" |.
links.csv: each line contains the movie id and corresponding id's at IMBd and TMDb.

The easy way to input this data is to download the zip file onto our computer, unzip it, and read the individual csv files using read.csv(). But anyone can do it the easy way. We want to automate this, so we can redo it without any manual steps. This takes some effort, but once we have it down we can apply it to lots of other data sources.



In [ ]:

Automate file download

We're looking for an automated way, so that if we do this again, possibly with updated data, the whole process is in our code. Automated data entry involves these steps:

Get the file. We use the requests package, which handles internet files and comes pre-installed with Anaconda. This kind of thing was hidden behind the scenes in the Pandas read_csv function, but here we need to do it for ourselves. The package authors add:
Recreational use of other HTTP libraries may result in dangerous side-effects, including: security vulnerabilities, verbose code, reinventing the wheel, constantly reading documentation, depression, headaches, or even death.
Convert to zip. Requests simply loads whatever's at the given url. The io module's io.Bytes reconstructs it as a file, here a zip file.
Unzip the file. We use the zipfile module, which is part of core Python, to extract the files inside.
Read in the csv's. Now that we've extracted the csv files, we use read_csv as usual.

We found this Stack Overflow exchange helpful.

Digression. This is probably more than you want to know, but it's a reminder of what goes on behind the scenes when we apply read_csv to a url. Here we grab whatever is at the url. Then we get its contents, convert it to bytes, identify it as a zip file, and read its components using read_csv. It's a lot easier when this happens automatically, but a reminder what's involved if we ever have to look into the details.



In [ ]:

    
# get "response" from url 
url = 'http://files.grouplens.org/datasets/movielens/ml-latest-small.zip'
r = requests.get(url) 

# describe response 
print('Response status code:', r.status_code)
print('Response type:', type(r))
print('Response .content:', type(r.content)) 
print('Response headers:\n', r.headers, sep='')



In [ ]:

    
# convert bytes to zip file  
mlz = zf.ZipFile(io.BytesIO(r.content)) 
print('Type of zipfile object:', type(mlz))



In [ ]:

    
# what's in the zip file?
mlz.namelist()



In [ ]:

    
mlz.open('ml-latest-small/links.csv')



In [ ]:

    
pd.read_csv(mlz.open('ml-latest-small/links.csv'))



In [ ]:

    
# extract and read csv's
movies  = pd.read_csv(mlz.open(mlz.namelist()[2]))
ratings = pd.read_csv(mlz.open(mlz.namelist()[3]))



In [ ]:

    
# what do we have? 
for df in [movies, ratings]:
    print('Type:', type(df))
    print('Dimensions:', df.shape, '\n')
    print('Variables:', df.columns.tolist(), '\n')
    print('First few rows', df.head(3), '\n')



In [ ]:

Exercise. Something to do together. suppose we wanted to save the files on our computer. How would we do it? Would we prefer individual csv's or a single zip?



In [ ]:

    
# writing csv (specify different location)
with open('test_01.csv', 'wb') as out_file:
    shutil.copyfileobj(mlz.open(mlz.namelist()[2]), out_file)



In [ ]:



In [ ]:

    
# experiment via http://stackoverflow.com/a/18043472/804513
with open('test.zip', 'wb') as out_file:
    shutil.copyfileobj(io.BytesIO(r.content), out_file)

Merging ratings and movie titles

The movie ratings in the dataframe ratings give us individual opinions about movies, but they don't include the name of the movie. Why not? Rather than include the name every time a movie is rated, the MovieLens data associates each rating with a movie code, than stores the names of movies associatd with each movie code in the dataframe movies. We run across this a lot: some information is in one data table, other information is in another.

Our want is therefore to add the movie name to the ratings dataframe. We say we merge the two dataferames. There are lots of ways to merge. Here we do one as an illustration.

Let's start by reminding ourselves what we have.



In [ ]:

    
ratings.head(3)



In [ ]:

    
movies.head(3)

Merging

Here's roughly what's involved in what we're doing. We take the movieId variable from ratings and look it up in movies. When we find it, we look up the title and add it as a column in ratings. The variable movieId is common, so we can use it to link the two dataframes.



In [ ]:

    
combo = pd.merge(ratings, movies,   # left and right df's
                 how='left',        # add to left 
                 on='movieId'       # link with this variable/column 
                ) 

print('Dimensions of ratings:', ratings.shape)
print('Dimensions of movies:', movies.shape)
print('Dimensions of new df:', combo.shape)

combo.head(20)



In [ ]:

    
combo_1 = ratings.merge(movies, how='left', on='movieId')



In [ ]:

    
combo_1.head()



In [ ]:

    
combo_2 = ratings.merge(movies, how='inner', on='movieId')
combo_2.shape



In [ ]:

    
combo_3 = movies.merge(ratings, how='right', on='movieId')
combo_3.shape



In [ ]:

    
# save as csv file for future use 
combo.to_csv('mlcombined.csv')



In [ ]:

    
count_2 = movies['movieId'].isin(ratings['movieId'])



In [ ]:

    
count_2.sum()



In [ ]:

    
print('Current directory:\n', os.getcwd(), sep='')
print('List of files:', os.listdir(), sep='\n')



In [ ]:

Exercise. Some of these we know how to do, the others we don't. For the ones we know, what is the answer? For the others, what (in loose terms) do we need to be able to do to come up with an answer?

What is the overall average rating?
What is the overall distribution of ratings?
What is the average rating of each movie?
How many ratings does each movie get?



In [ ]:

    
combo['rating'].mean()



In [ ]:

    
fig, ax = plt.subplots()
bins = [bin/100 for bin in list(range(25, 575, 50))]
print(bins)
combo['rating'].plot(kind='hist', ax=ax, bins=bins, color='blue', alpha=0.5)
ax.set_xlim(0,5.5)
ax.set_ylabel('Number')
ax.set_xlabel('Rating')
plt.show()



In [ ]:

    
from plotly.offline import iplot             # plotting functions
import plotly.graph_objs as go               # ditto
import plotly



In [ ]:

    
plotly.offline.init_notebook_mode(connected=True)



In [ ]:

    
trace = go.Histogram(
    x=combo['rating'],
    histnorm='count',
    name='control',
    autobinx=False,
    xbins=dict(
        start=.5,
        end=5.0,
        size=0.5
    ),
    marker=dict(
        color='Blue',
    ),
    opacity=0.75
)

layout = go.Layout(
    title='Distribution of ratings',
    xaxis=dict(
        title='Rating value'
    ),
    yaxis=dict(
        title='Count'
    ),
    bargap=0.01,
    bargroupgap=0.1
)

iplot(go.Figure(data=[trace], layout=layout))



In [ ]:

    
combo[combo['movieId']==31]['rating'].mean()



In [ ]:

    
ave_mov = combo['rating'].groupby(combo['movieId']).mean()



In [ ]:

    
ave_mov = ave_mov.reset_index()



In [ ]:

    
ave_mov = ave_mov.rename(columns={"rating": "average rating"})



In [ ]:

    
combo2 = combo.merge(ave_mov, how='left', on='movieId')



In [ ]:

    
combo2.shape



In [ ]:

    
combo2.head(3)



In [ ]:

    
combo['ave'] = combo['rating'].groupby(combo['movieId']).transform('mean')



In [ ]:

    
combo.head()



In [ ]:

    
combo2[combo['movieId']==1129]



In [ ]:

    
combo['count'] = combo['rating'].groupby(combo['movieId']).transform('count')



In [ ]:

    
combo.head()

Population "estimates" and "projections"

We look (again) at the UN's population data, specifically the age distribution of the population. The data comes in two sheets: estimates that cover the period 1950-2015 and projections that cover 2016-2100. Our mission is to combine them.

Load data

We start, as usual, by loading the data. This takes a minute or so.



In [2]:

    
url1 = 'http://esa.un.org/unpd/wpp/DVD/Files/'
url2 = '1_Indicators%20(Standard)/EXCEL_FILES/1_Population/'
url3 = 'WPP2015_POP_F07_1_POPULATION_BY_AGE_BOTH_SEXES.XLS'
url = url1 + url2 + url3 

cols = [2, 5] + list(range(6,28))
est = pd.read_excel(url, sheetname=0, skiprows=16, parse_cols=cols, na_values=['…'])
prj = pd.read_excel(url, sheetname=1, skiprows=16, parse_cols=cols, na_values=['…'])

print('Dimensions and dtypes of estimates: ', est.shape, '\n', est.dtypes.head(), sep='')
print('\nDimensions and dtypes of projections: ', prj.shape, '\n', prj.dtypes.head(), sep='')









    



Dimensions and dtypes of estimates: (3374, 24)
Major area, region, country or area *     object
Reference date (as of 1 July)              int64
0-4                                      float64
5-9                                      float64
10-14                                    float64
dtype: object

Dimensions and dtypes of projections: (4338, 23)
Major area, region, country or area *     object
Reference date (as of 1 July)              int64
0-4                                      float64
5-9                                      float64
10-14                                    float64
dtype: object



In [3]:

    
est.to_csv('un_pop_est.csv')
prj.to_csv('un_pop_proj.csv')

Comment. Note that they have different numbers of columns. Let's see where that comes from.



In [4]:

    
list(est)[15:]









    Out[4]:





['65-69', '70-74', '75-79', '80+', '80-84', '85-89', '90-94', '95-99', '100+']



In [5]:

    
list(prj)[15:]









    Out[5]:





['65-69', '70-74', '75-79', '80-84', '85-89', '90-94', '95-99', '100+']



In [6]:

    
est.head()









    Out[6]:






  
    
      
      Major area, region, country or area *
      Reference date (as of 1 July)
      0-4
      5-9
      10-14
      15-19
      20-24
      25-29
      30-34
      35-39
      ...
      60-64
      65-69
      70-74
      75-79
      80+
      80-84
      85-89
      90-94
      95-99
      100+
    
  
  
    
      0
      WORLD
      1950
      337431.893
      269549.563
      260285.973
      238628.448
      221780.909
      194423.919
      166936.550
      162917.428
      ...
      73422.363
      55106.429
      37360.188
      21997.317
      14201.827
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      1
      WORLD
      1955
      402845.030
      315054.954
      263265.738
      254815.285
      231891.794
      214877.713
      187940.577
      160385.333
      ...
      76843.159
      59321.778
      40345.907
      23755.192
      16158.258
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      2
      WORLD
      1960
      430565.118
      380319.072
      309276.447
      257898.631
      248412.663
      225956.592
      208746.975
      181631.886
      ...
      85064.295
      62665.125
      44018.202
      25985.789
      18061.218
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      3
      WORLD
      1965
      477798.025
      409020.250
      372816.602
      303891.430
      251896.962
      242692.181
      219978.079
      202499.120
      ...
      95085.421
      70255.639
      47382.415
      29457.425
      21031.995
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      4
      WORLD
      1970
      522640.874
      458298.455
      403910.553
      367789.079
      297557.428
      246920.905
      237656.753
      214329.535
      ...
      107200.639
      81022.911
      55168.403
      32875.945
      25340.203
      NaN
      NaN
      NaN
      NaN
      NaN
    
  

5 rows × 24 columns



In [7]:

    
prj.head()









    Out[7]:






  
    
      
      Major area, region, country or area *
      Reference date (as of 1 July)
      0-4
      5-9
      10-14
      15-19
      20-24
      25-29
      30-34
      35-39
      ...
      55-59
      60-64
      65-69
      70-74
      75-79
      80-84
      85-89
      90-94
      95-99
      100+
    
  
  
    
      0
      WORLD
      2015
      670928.185
      637448.895
      607431.299
      590069.337
      603508.555
      609793.207
      551085.426
      497755.634
      ...
      339691.467
      292726.792
      215047.151
      153207.026
      114652.312
      71449.698
      37062.196
      13388.529
      2921.852
      450.895
    
    
      1
      WORLD
      2020
      677599.590
      664282.610
      634568.409
      604322.921
      585831.926
      599278.263
      604808.769
      544641.774
      ...
      387339.470
      320927.366
      267874.842
      187441.892
      123355.620
      81851.457
      42700.991
      16937.683
      4242.605
      605.465
    
    
      2
      WORLD
      2025
      673174.914
      671929.973
      661684.410
      631509.113
      600050.925
      581681.875
      594598.480
      598538.429
      ...
      428166.199
      367325.750
      295011.725
      234797.844
      152989.596
      89822.147
      49847.989
      20116.420
      5538.923
      905.182
    
    
      3
      WORLD
      2030
      672121.017
      667926.121
      669744.224
      658942.513
      627343.133
      595871.013
      577192.328
      588723.101
      ...
      450091.908
      407564.034
      339528.512
      260426.171
      193066.277
      113475.561
      56221.915
      24041.609
      6836.461
      1244.851
    
    
      4
      WORLD
      2035
      678271.589
      667353.285
      665875.094
      667248.100
      655034.545
      623315.619
      591598.185
      571825.139
      ...
      456022.842
      429343.580
      378601.876
      301893.804
      216249.259
      144393.943
      72643.007
      28074.222
      8397.572
      1617.797
    
  

5 rows × 23 columns

Clean data

Pick a useable subset and fix extra column so that we can combine them. The problem here is that until 1990, the highest age category was '80+. From 1990 on, we have a finer breakdown.

We fix this by reassigning '80+' to '80-84' and not worrying that some of these people are 85 or older. Note that df.fillna(0.0) replaces missing values with zeros.



In [8]:

    
def cleanpop(df, countries, years): 
    """
    take df as input and select countries and years 
    """
    # rename first two columns 
    names = list(df) 
    df = df.rename(columns={names[0]: 'Country', names[1]: 'Year'}) 

    # select countries and years
    newdf = df[df['Country'].isin(countries) & df['Year'].isin(years)] 
    
    return newdf

countries = ['Japan']
past      = [1950, 2000]
future    = [2050, 2100]

e = cleanpop(est, countries, past) 
p = cleanpop(prj, countries, future)

# make copie for later use 
ealt = e.copy()
palt = p.copy()



In [9]:

    
# fix top-coding in estimates
e['80-84'] = e['80-84'].fillna(0.0) + e['80+'].fillna(0.0)
e = e.drop(['80+'], axis=1)

# check dimensions again
print('Dimensions of cleaned estimates: ', e.shape)
print('Dimensions of cleaned projections: ', p.shape)









    



Dimensions of cleaned estimates:  (2, 23)
Dimensions of cleaned projections:  (2, 23)



In [10]:

    
# check to see if we have the same variables 
list(e) == list(p)









    Out[10]:





True



In [11]:

    
ealt.head()









    Out[11]:






  
    
      
      Country
      Year
      0-4
      5-9
      10-14
      15-19
      20-24
      25-29
      30-34
      35-39
      ...
      60-64
      65-69
      70-74
      75-79
      80+
      80-84
      85-89
      90-94
      95-99
      100+
    
  
  
    
      1134
      Japan
      1950
      10997.606
      9448.307
      8634.342
      8484.002
      7636.338
      6089.337
      5104.356
      4989.952
      ...
      2277.718
      1757.865
      1271.333
      674.869
      362.356
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      1144
      Japan
      2000
      5858.083
      5992.304
      6534.552
      7481.678
      8394.685
      9782.756
      8684.281
      8046.208
      ...
      7673.926
      7024.429
      5825.320
      4082.687
      NaN
      2540.718
      1476.095
      527.205
      111.014
      10.78
    
  

2 rows × 24 columns



In [11]:

    
e.head()









    Out[11]:






  
    
      
      Country
      Year
      0-4
      5-9
      10-14
      15-19
      20-24
      25-29
      30-34
      35-39
      ...
      55-59
      60-64
      65-69
      70-74
      75-79
      80-84
      85-89
      90-94
      95-99
      100+
    
  
  
    
      1134
      Japan
      1950
      10997.606
      9448.307
      8634.342
      8484.002
      7636.338
      6089.337
      5104.356
      4989.952
      ...
      2728.668
      2277.718
      1757.865
      1271.333
      674.869
      362.356
      NaN
      NaN
      NaN
      NaN
    
    
      1144
      Japan
      2000
      5858.083
      5992.304
      6534.552
      7481.678
      8394.685
      9782.756
      8684.281
      8046.208
      ...
      8654.066
      7673.926
      7024.429
      5825.320
      4082.687
      2540.718
      1476.095
      527.205
      111.014
      10.78
    
  

2 rows × 23 columns

Merge estimates and projections

If we have two blocks of data, and just want to put them on top of each other, we use the Pandas' concatenate function. Ditto two blocks next to each other.

But first we need to fix the difference in the columns of the two dataframes.



In [12]:

    
pop = pd.concat([e, p], axis=0).fillna(0.0)
pop









    Out[12]:






  
    
      
      Country
      Year
      0-4
      5-9
      10-14
      15-19
      20-24
      25-29
      30-34
      35-39
      ...
      55-59
      60-64
      65-69
      70-74
      75-79
      80-84
      85-89
      90-94
      95-99
      100+
    
  
  
    
      1134
      Japan
      1950
      10997.606
      9448.307
      8634.342
      8484.002
      7636.338
      6089.337
      5104.356
      4989.952
      ...
      2728.668
      2277.718
      1757.865
      1271.333
      674.869
      362.356
      0.000
      0.000
      0.000
      0.000
    
    
      1144
      Japan
      2000
      5858.083
      5992.304
      6534.552
      7481.678
      8394.685
      9782.756
      8684.281
      8046.208
      ...
      8654.066
      7673.926
      7024.429
      5825.320
      4082.687
      2540.718
      1476.095
      527.205
      111.014
      10.780
    
    
      1465
      Japan
      2050
      4365.709
      4454.049
      4533.887
      4644.123
      4791.892
      4986.758
      5222.010
      5419.444
      ...
      6065.917
      6630.748
      7010.576
      7539.984
      8225.455
      6530.305
      4766.516
      2968.450
      1523.374
      441.117
    
    
      1475
      Japan
      2100
      3662.422
      3723.514
      3791.893
      3872.946
      3960.466
      4046.149
      4121.987
      4203.284
      ...
      4561.356
      4613.669
      4655.102
      4687.750
      4715.118
      4656.733
      4296.603
      3419.217
      2105.037
      904.728
    
  

4 rows × 23 columns

Exercise. What happens if we try to merge the original dataframes, including the one with the extra 80+ column? Run the code below and comment on what you get.



In [13]:

    
popalt = pd.concat([ealt, palt], axis=0)
popalt









    Out[13]:






  
    
      
      0-4
      10-14
      100+
      15-19
      20-24
      25-29
      30-34
      35-39
      40-44
      45-49
      ...
      65-69
      70-74
      75-79
      80+
      80-84
      85-89
      90-94
      95-99
      Country
      Year
    
  
  
    
      1134
      10997.606
      8634.342
      NaN
      8484.002
      7636.338
      6089.337
      5104.356
      4989.952
      4429.172
      3959.115
      ...
      1757.865
      1271.333
      674.869
      362.356
      NaN
      NaN
      NaN
      NaN
      Japan
      1950
    
    
      1144
      5858.083
      6534.552
      10.780
      7481.678
      8394.685
      9782.756
      8684.281
      8046.208
      7741.920
      8928.982
      ...
      7024.429
      5825.320
      4082.687
      NaN
      2540.718
      1476.095
      527.205
      111.014
      Japan
      2000
    
    
      1465
      4365.709
      4533.887
      441.117
      4644.123
      4791.892
      4986.758
      5222.010
      5419.444
      5543.959
      5727.320
      ...
      7010.576
      7539.984
      8225.455
      NaN
      6530.305
      4766.516
      2968.450
      1523.374
      Japan
      2050
    
    
      1475
      3662.422
      3791.893
      904.728
      3872.946
      3960.466
      4046.149
      4121.987
      4203.284
      4294.733
      4394.420
      ...
      4655.102
      4687.750
      4715.118
      NaN
      4656.733
      4296.603
      3419.217
      2105.037
      Japan
      2100
    
  

4 rows × 24 columns

Shape data

We want age categories in the index (the default x axis in a plot) and the years in the columns. The country we don't care about because there's only one.



In [13]:

    
pop = pop.drop('Country', axis=1)
popi = pop.set_index('Year')
popi









    Out[13]:






  
    
      
      0-4
      5-9
      10-14
      15-19
      20-24
      25-29
      30-34
      35-39
      40-44
      45-49
      ...
      55-59
      60-64
      65-69
      70-74
      75-79
      80-84
      85-89
      90-94
      95-99
      100+
    
    
      Year
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      1950
      10997.606
      9448.307
      8634.342
      8484.002
      7636.338
      6089.337
      5104.356
      4989.952
      4429.172
      3959.115
      ...
      2728.668
      2277.718
      1757.865
      1271.333
      674.869
      362.356
      0.000
      0.000
      0.000
      0.000
    
    
      2000
      5858.083
      5992.304
      6534.552
      7481.678
      8394.685
      9782.756
      8684.281
      8046.208
      7741.920
      8928.982
      ...
      8654.066
      7673.926
      7024.429
      5825.320
      4082.687
      2540.718
      1476.095
      527.205
      111.014
      10.780
    
    
      2050
      4365.709
      4454.049
      4533.887
      4644.123
      4791.892
      4986.758
      5222.010
      5419.444
      5543.959
      5727.320
      ...
      6065.917
      6630.748
      7010.576
      7539.984
      8225.455
      6530.305
      4766.516
      2968.450
      1523.374
      441.117
    
    
      2100
      3662.422
      3723.514
      3791.893
      3872.946
      3960.466
      4046.149
      4121.987
      4203.284
      4294.733
      4394.420
      ...
      4561.356
      4613.669
      4655.102
      4687.750
      4715.118
      4656.733
      4296.603
      3419.217
      2105.037
      904.728
    
  

4 rows × 21 columns



In [14]:

    
popi.columns.name = 'Age'
popt = popi.T
popt.head()



In [15]:

    
ax = popt.plot(kind='bar', color='blue', alpha=0.5, 
               subplots=True, 
               sharey=True, 
               figsize=(8,12))



In [ ]:

Exercise. Use set_index, stack, and unstack to shape the dataframe popi into popt.



In [16]:

    
popi









    Out[16]:






  
    
      Age
      0-4
      5-9
      10-14
      15-19
      20-24
      25-29
      30-34
      35-39
      40-44
      45-49
      ...
      55-59
      60-64
      65-69
      70-74
      75-79
      80-84
      85-89
      90-94
      95-99
      100+
    
    
      Year
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      1950
      10997.606
      9448.307
      8634.342
      8484.002
      7636.338
      6089.337
      5104.356
      4989.952
      4429.172
      3959.115
      ...
      2728.668
      2277.718
      1757.865
      1271.333
      674.869
      362.356
      0.000
      0.000
      0.000
      0.000
    
    
      2000
      5858.083
      5992.304
      6534.552
      7481.678
      8394.685
      9782.756
      8684.281
      8046.208
      7741.920
      8928.982
      ...
      8654.066
      7673.926
      7024.429
      5825.320
      4082.687
      2540.718
      1476.095
      527.205
      111.014
      10.780
    
    
      2050
      4365.709
      4454.049
      4533.887
      4644.123
      4791.892
      4986.758
      5222.010
      5419.444
      5543.959
      5727.320
      ...
      6065.917
      6630.748
      7010.576
      7539.984
      8225.455
      6530.305
      4766.516
      2968.450
      1523.374
      441.117
    
    
      2100
      3662.422
      3723.514
      3791.893
      3872.946
      3960.466
      4046.149
      4121.987
      4203.284
      4294.733
      4394.420
      ...
      4561.356
      4613.669
      4655.102
      4687.750
      4715.118
      4656.733
      4296.603
      3419.217
      2105.037
      904.728
    
  

4 rows × 21 columns



In [17]:

    
popi.stack().unstack(level='Year')



In [18]:

    
list(range(1950, 2016, 5))



In [17]:

    
countries = ['United States of America', 'Japan']
past      = list(range(1950, 2016, 5))
future    = list(range(2015, 2101, 5))

e_US_J = cleanpop(est, countries, past) 
p_US_J = cleanpop(prj, countries, future)

# fix top-coding in estimates
e_US_J['80-84'] = e_US_J['80-84'].fillna(0.0) + e_US_J['80+'].fillna(0.0)
e_US_J = e_US_J.drop(['80+'], axis=1)



In [18]:

    
e_US_J.head()









    Out[18]:






  
    
      
      Country
      Year
      0-4
      5-9
      10-14
      15-19
      20-24
      25-29
      30-34
      35-39
      ...
      55-59
      60-64
      65-69
      70-74
      75-79
      80-84
      85-89
      90-94
      95-99
      100+
    
  
  
    
      1134
      Japan
      1950
      10997.606
      9448.307
      8634.342
      8484.002
      7636.338
      6089.337
      5104.356
      4989.952
      ...
      2728.668
      2277.718
      1757.865
      1271.333
      674.869
      362.356
      NaN
      NaN
      NaN
      NaN
    
    
      1135
      Japan
      1955
      9204.626
      10947.970
      9426.911
      8565.922
      8327.713
      7519.489
      6022.648
      5018.096
      ...
      3173.061
      2477.335
      1943.396
      1381.819
      868.574
      503.311
      NaN
      NaN
      NaN
      NaN
    
    
      1136
      Japan
      1960
      7751.169
      9227.342
      10919.008
      9226.360
      8256.058
      8122.805
      7427.547
      5941.973
      ...
      3598.817
      2901.413
      2142.817
      1543.870
      946.170
      663.662
      NaN
      NaN
      NaN
      NaN
    
    
      1137
      Japan
      1965
      8105.794
      7784.161
      9195.272
      10745.365
      8981.876
      8300.051
      8170.033
      7407.366
      ...
      3952.628
      3302.217
      2532.457
      1727.221
      1079.470
      768.415
      NaN
      NaN
      NaN
      NaN
    
    
      1138
      Japan
      1970
      8902.041
      8196.696
      7917.089
      9180.009
      10620.425
      9061.923
      8376.841
      8187.951
      ...
      4396.458
      3701.489
      2964.635
      2123.400
      1265.408
      939.598
      NaN
      NaN
      NaN
      NaN
    
  

5 rows × 23 columns



In [19]:

    
p_US_J[p_US_J['Country']=='United States of America'].head()









    Out[19]:






  
    
      
      Country
      Year
      0-4
      5-9
      10-14
      15-19
      20-24
      25-29
      30-34
      35-39
      ...
      55-59
      60-64
      65-69
      70-74
      75-79
      80-84
      85-89
      90-94
      95-99
      100+
    
  
  
    
      3996
      United States of America
      2015
      19701.274
      20633.332
      20642.609
      20721.296
      23288.826
      21990.484
      22032.293
      19725.338
      ...
      22129.289
      18966.991
      16028.634
      11427.606
      8020.824
      5937.915
      3877.561
      1735.717
      477.443
      71.972
    
    
      3997
      United States of America
      2020
      20672.623
      20095.221
      21202.638
      21287.870
      21339.101
      23830.250
      22420.925
      22298.257
      ...
      22405.536
      21326.390
      17960.822
      14732.673
      9977.856
      6399.083
      4025.248
      1964.662
      576.103
      95.543
    
    
      3998
      United States of America
      2025
      21282.091
      21067.227
      20665.959
      21849.693
      21910.763
      21898.651
      24263.160
      22696.822
      ...
      19949.374
      21656.874
      20266.623
      16594.890
      12955.986
      8050.796
      4400.099
      2080.972
      669.505
      119.579
    
    
      3999
      United States of America
      2030
      21466.822
      21677.522
      21638.496
      21315.887
      22477.288
      22477.228
      22350.565
      24541.701
      ...
      20124.704
      19347.657
      20659.611
      18818.075
      14702.378
      10560.532
      5624.507
      2321.690
      727.774
      143.984
    
    
      4000
      United States of America
      2035
      21575.393
      21863.314
      22249.474
      22289.550
      21950.247
      23050.112
      22936.418
      22649.523
      ...
      19296.169
      19572.949
      18530.664
      19278.754
      16788.244
      12107.722
      7482.957
      3035.847
      833.460
      163.028
    
  

5 rows × 23 columns



In [21]:

    
pop_US_J = pd.concat([e_US_J, p_US_J], axis=0)#.fillna(0.0)
pop_US_J.shape









    Out[21]:





(64, 23)



In [22]:

    
pop_US_J









    Out[22]:






  
    
      
      Country
      Year
      0-4
      5-9
      10-14
      15-19
      20-24
      25-29
      30-34
      35-39
      ...
      55-59
      60-64
      65-69
      70-74
      75-79
      80-84
      85-89
      90-94
      95-99
      100+
    
  
  
    
      1134
      Japan
      1950
      10997.606
      9448.307
      8634.342
      8484.002
      7636.338
      6089.337
      5104.356
      4989.952
      ...
      2728.668
      2277.718
      1757.865
      1271.333
      674.869
      362.356
      NaN
      NaN
      NaN
      NaN
    
    
      1135
      Japan
      1955
      9204.626
      10947.970
      9426.911
      8565.922
      8327.713
      7519.489
      6022.648
      5018.096
      ...
      3173.061
      2477.335
      1943.396
      1381.819
      868.574
      503.311
      NaN
      NaN
      NaN
      NaN
    
    
      1136
      Japan
      1960
      7751.169
      9227.342
      10919.008
      9226.360
      8256.058
      8122.805
      7427.547
      5941.973
      ...
      3598.817
      2901.413
      2142.817
      1543.870
      946.170
      663.662
      NaN
      NaN
      NaN
      NaN
    
    
      1137
      Japan
      1965
      8105.794
      7784.161
      9195.272
      10745.365
      8981.876
      8300.051
      8170.033
      7407.366
      ...
      3952.628
      3302.217
      2532.457
      1727.221
      1079.470
      768.415
      NaN
      NaN
      NaN
      NaN
    
    
      1138
      Japan
      1970
      8902.041
      8196.696
      7917.089
      9180.009
      10620.425
      9061.923
      8376.841
      8187.951
      ...
      4396.458
      3701.489
      2964.635
      2123.400
      1265.408
      939.598
      NaN
      NaN
      NaN
      NaN
    
    
      1139
      Japan
      1975
      9913.633
      8840.587
      8210.002
      7887.167
      9085.397
      10694.082
      9164.572
      8359.107
      ...
      4581.554
      4229.693
      3401.963
      2538.406
      1617.769
      1175.370
      NaN
      NaN
      NaN
      NaN
    
    
      1140
      Japan
      1980
      8425.462
      10023.593
      8861.847
      8199.056
      7778.115
      9054.658
      10669.960
      9119.977
      ...
      5523.713
      4375.630
      3912.158
      2978.445
      2003.147
      1591.386
      NaN
      NaN
      NaN
      NaN
    
    
      1141
      Japan
      1985
      7406.792
      8504.840
      10033.840
      8884.920
      8132.473
      7766.015
      9070.608
      10638.236
      ...
      6915.436
      5319.246
      4108.403
      3514.634
      2453.968
      2160.195
      NaN
      NaN
      NaN
      NaN
    
    
      1142
      Japan
      1990
      6430.434
      7460.626
      8498.302
      9995.297
      8702.295
      7997.355
      7726.074
      9015.470
      ...
      7641.491
      6662.651
      5021.324
      3738.316
      2975.016
      1779.613
      816.016
      237.003
      36.451
      2.821
    
    
      1143
      Japan
      1995
      5970.705
      6528.210
      7471.766
      8530.933
      9884.930
      8694.585
      8056.326
      7763.030
      ...
      7891.922
      7392.964
      6317.749
      4619.441
      3216.916
      2251.198
      1087.343
      354.562
      64.617
      5.268
    
    
      1144
      Japan
      2000
      5858.083
      5992.304
      6534.552
      7481.678
      8394.685
      9782.756
      8684.281
      8046.208
      ...
      8654.066
      7673.926
      7024.429
      5825.320
      4082.687
      2540.718
      1476.095
      527.205
      111.014
      10.780
    
    
      1145
      Japan
      2005
      5639.483
      5877.963
      6010.086
      6582.365
      7549.657
      8470.903
      9789.363
      8705.296
      ...
      10101.564
      8453.822
      7351.925
      6540.869
      5171.887
      3343.808
      1777.136
      799.344
      193.667
      22.548
    
    
      1146
      Japan
      2010
      5393.341
      5598.905
      5928.416
      6055.624
      6795.214
      7425.867
      8338.007
      9711.312
      ...
      8754.481
      9842.167
      8260.983
      6968.981
      5927.025
      4332.446
      2393.791
      987.060
      299.738
      39.939
    
    
      1147
      Japan
      2015
      5269.038
      5398.973
      5603.638
      5960.784
      6111.768
      6843.421
      7455.687
      8345.753
      ...
      7535.334
      8530.749
      9452.518
      7770.410
      6297.532
      4940.325
      3115.732
      1338.774
      366.082
      60.630
    
    
      3108
      United States of America
      1950
      17236.888
      13784.766
      11574.070
      11217.036
      12249.727
      12872.040
      12143.469
      11562.518
      ...
      7634.709
      6685.258
      5485.444
      3540.719
      2215.991
      1800.625
      NaN
      NaN
      NaN
      NaN
    
    
      3109
      United States of America
      1955
      19269.975
      17206.424
      13955.143
      11400.351
      10908.524
      12257.490
      12885.021
      12111.959
      ...
      8090.882
      7275.168
      5968.272
      4255.662
      2646.757
      2198.321
      NaN
      NaN
      NaN
      NaN
    
    
      3110
      United States of America
      1960
      20744.261
      19295.372
      17308.246
      13721.545
      11539.773
      11444.456
      12427.768
      12896.664
      ...
      8530.910
      7605.518
      6399.870
      4736.921
      3265.623
      2625.181
      NaN
      NaN
      NaN
      NaN
    
    
      3111
      United States of America
      1965
      19791.210
      20884.622
      19454.907
      17345.691
      13906.573
      11851.783
      11622.448
      12421.932
      ...
      10032.254
      7705.239
      6733.457
      5334.653
      3672.199
      3175.375
      NaN
      NaN
      NaN
      NaN
    
    
      3112
      United States of America
      1970
      17667.442
      20348.013
      21053.385
      19491.649
      16833.698
      14110.517
      12087.959
      11587.576
      ...
      10372.082
      9075.167
      6947.780
      5583.059
      4060.962
      3819.403
      NaN
      NaN
      NaN
      NaN
    
    
      3113
      United States of America
      1975
      16479.229
      17837.717
      20553.731
      21207.870
      19694.953
      17449.937
      14422.578
      12038.360
      ...
      10866.342
      9545.891
      8266.448
      5663.315
      4372.948
      4627.202
      NaN
      NaN
      NaN
      NaN
    
    
      3114
      United States of America
      1980
      16644.732
      16801.933
      18679.477
      21302.278
      21613.544
      19949.953
      17737.268
      13966.040
      ...
      11667.279
      10250.369
      8990.399
      7061.010
      4847.777
      5173.612
      NaN
      NaN
      NaN
      NaN
    
    
      3115
      United States of America
      1985
      17870.945
      16891.274
      17279.962
      19169.899
      21522.150
      21936.011
      20341.459
      17649.541
      ...
      11327.519
      10967.588
      9482.496
      7667.307
      5722.555
      5987.757
      NaN
      NaN
      NaN
      NaN
    
    
      3116
      United States of America
      1990
      18933.119
      18144.024
      17536.018
      18071.118
      19613.498
      21541.111
      22306.806
      20320.000
      ...
      10574.797
      10747.041
      10152.695
      8161.049
      6279.346
      4056.324
      2005.257
      756.500
      219.117
      32.143
    
    
      3117
      United States of America
      1995
      19446.403
      19291.488
      19170.677
      18570.164
      18440.286
      19965.336
      22507.836
      22568.279
      ...
      11306.811
      10093.377
      9951.941
      8813.615
      6716.088
      4357.573
      2488.427
      921.706
      238.527
      43.766
    
    
      3118
      United States of America
      2000
      19247.173
      20360.577
      20531.685
      20282.813
      19387.432
      19525.378
      20883.895
      22892.692
      ...
      13383.754
      10991.812
      9511.066
      8836.743
      7399.257
      5000.234
      2654.098
      1127.040
      273.762
      45.588
    
    
      3119
      United States of America
      2005
      19897.945
      19460.301
      21137.730
      21447.948
      21022.391
      19793.433
      19969.282
      21154.292
      ...
      17278.315
      12958.931
      10405.929
      8592.351
      7437.681
      5399.245
      3090.386
      1188.947
      324.782
      47.705
    
    
      3120
      United States of America
      2010
      20177.861
      20321.997
      20711.385
      21838.942
      21643.953
      21199.737
      20294.368
      20299.432
      ...
      19860.709
      16763.463
      12340.665
      9442.827
      7305.152
      5817.563
      3510.353
      1476.753
      368.354
      60.754
    
    
      3121
      United States of America
      2015
      19701.274
      20633.332
      20642.609
      20721.296
      23288.826
      21990.484
      22032.293
      19725.338
      ...
      22129.289
      18966.991
      16028.634
      11427.606
      8020.824
      5937.915
      3877.561
      1735.717
      477.443
      71.972
    
    
      1458
      Japan
      2015
      5269.038
      5398.973
      5603.638
      5960.784
      6111.768
      6843.421
      7455.687
      8345.753
      ...
      7535.334
      8530.749
      9452.518
      7770.410
      6297.532
      4940.325
      3115.732
      1338.774
      366.082
      60.630
    
    
      1459
      Japan
      2020
      5082.934
      5272.586
      5402.024
      5626.194
      5999.178
      6144.661
      6862.479
      7458.664
      ...
      7746.073
      7357.595
      8220.359
      8937.769
      7085.091
      5322.065
      3629.386
      1784.418
      518.622
      79.523
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      1464
      Japan
      2045
      4448.702
      4529.657
      4619.658
      4748.906
      4947.499
      5194.574
      5404.633
      5541.785
      ...
      6728.225
      7171.861
      7796.329
      8693.253
      7260.858
      5876.803
      4469.537
      3242.964
      1597.432
      338.945
    
    
      1465
      Japan
      2050
      4365.709
      4454.049
      4533.887
      4644.123
      4791.892
      4986.758
      5222.010
      5419.444
      ...
      6065.917
      6630.748
      7010.576
      7539.984
      8225.455
      6530.305
      4766.516
      2968.450
      1523.374
      441.117
    
    
      1466
      Japan
      2055
      4271.907
      4371.016
      4458.145
      4557.117
      4685.414
      4829.637
      5013.420
      5236.482
      ...
      5971.177
      5985.655
      6493.459
      6798.299
      7167.066
      7456.047
      5368.363
      3233.182
      1437.057
      462.541
    
    
      1467
      Japan
      2060
      4178.608
      4276.984
      4374.933
      4480.210
      4596.410
      4721.439
      4855.253
      5027.659
      ...
      5659.099
      5898.419
      5871.654
      6312.598
      6487.951
      6543.625
      6204.231
      3714.541
      1610.373
      461.317
    
    
      1468
      Japan
      2065
      4098.930
      4183.472
      4280.722
      4395.831
      4517.465
      4630.632
      4745.904
      4869.093
      ...
      5474.018
      5595.983
      5795.188
      5722.098
      6048.305
      5963.869
      5511.094
      4377.365
      1904.730
      518.644
    
    
      1469
      Japan
      2070
      4033.949
      4103.532
      4187.026
      4300.453
      4430.978
      4549.815
      4653.840
      4759.180
      ...
      5360.210
      5417.808
      5505.663
      5659.285
      5501.812
      5593.846
      5075.201
      3958.367
      2304.760
      624.430
    
    
      1470
      Japan
      2075
      3979.032
      4038.305
      4106.899
      4205.584
      4333.554
      4461.473
      4571.743
      4666.557
      ...
      5186.125
      5310.115
      5338.032
      5388.297
      5460.947
      5122.114
      4815.303
      3714.384
      2146.187
      779.118
    
    
      1471
      Japan
      2080
      3917.939
      3983.137
      4041.461
      4124.256
      4236.564
      4362.189
      4482.112
      4583.827
      ...
      4985.627
      5142.129
      5238.847
      5234.625
      5216.768
      5114.646
      4457.916
      3589.217
      2069.513
      803.601
    
    
      1472
      Japan
      2085
      3854.440
      3921.774
      3986.080
      4057.611
      4153.082
      4263.240
      4381.470
      4493.462
      ...
      4833.530
      4946.985
      5078.865
      5145.946
      5082.046
      4911.188
      4493.169
      3378.072
      2050.730
      812.047
    
    
      1473
      Japan
      2090
      3786.347
      3857.981
      3924.504
      4000.999
      4084.266
      4177.779
      4281.149
      4392.081
      ...
      4728.712
      4799.365
      4891.314
      4996.711
      5008.993
      4807.584
      4354.049
      3460.194
      1981.681
      833.559
    
    
      1474
      Japan
      2095
      3720.541
      3789.596
      3860.493
      3938.189
      4025.477
      4106.967
      4194.272
      4290.983
      ...
      4640.419
      4698.495
      4750.410
      4819.804
      4876.471
      4761.542
      4300.830
      3407.697
      2082.468
      844.420
    
    
      1475
      Japan
      2100
      3662.422
      3723.514
      3791.893
      3872.946
      3960.466
      4046.149
      4121.987
      4203.284
      ...
      4561.356
      4613.669
      4655.102
      4687.750
      4715.118
      4656.733
      4296.603
      3419.217
      2105.037
      904.728
    
    
      3996
      United States of America
      2015
      19701.274
      20633.332
      20642.609
      20721.296
      23288.826
      21990.484
      22032.293
      19725.338
      ...
      22129.289
      18966.991
      16028.634
      11427.606
      8020.824
      5937.915
      3877.561
      1735.717
      477.443
      71.972
    
    
      3997
      United States of America
      2020
      20672.623
      20095.221
      21202.638
      21287.870
      21339.101
      23830.250
      22420.925
      22298.257
      ...
      22405.536
      21326.390
      17960.822
      14732.673
      9977.856
      6399.083
      4025.248
      1964.662
      576.103
      95.543
    
    
      3998
      United States of America
      2025
      21282.091
      21067.227
      20665.959
      21849.693
      21910.763
      21898.651
      24263.160
      22696.822
      ...
      19949.374
      21656.874
      20266.623
      16594.890
      12955.986
      8050.796
      4400.099
      2080.972
      669.505
      119.579
    
    
      3999
      United States of America
      2030
      21466.822
      21677.522
      21638.496
      21315.887
      22477.288
      22477.228
      22350.565
      24541.701
      ...
      20124.704
      19347.657
      20659.611
      18818.075
      14702.378
      10560.532
      5624.507
      2321.690
      727.774
      143.984
    
    
      4000
      United States of America
      2035
      21575.393
      21863.314
      22249.474
      22289.550
      21950.247
      23050.112
      22936.418
      22649.523
      ...
      19296.169
      19572.949
      18530.664
      19278.754
      16788.244
      12107.722
      7482.957
      3035.847
      833.460
      163.028
    
    
      4001
      United States of America
      2040
      21830.520
      21972.900
      22436.063
      22902.009
      22927.102
      22532.197
      23516.076
      23243.187
      ...
      21834.111
      18823.139
      18819.465
      17386.976
      17332.777
      13981.338
      8725.928
      4137.353
      1127.852
      191.918
    
    
      4002
      United States of America
      2045
      22134.050
      22228.757
      22546.487
      23090.260
      23543.130
      23513.195
      23007.510
      23830.111
      ...
      22313.644
      21341.601
      18162.953
      17744.842
      15741.818
      14585.725
      10232.749
      4940.512
      1584.503
      262.329
    
    
      4003
      United States of America
      2050
      22544.790
      22532.899
      22802.809
      23201.972
      23735.022
      24133.657
      23992.427
      23331.197
      ...
      24174.076
      21856.221
      20647.557
      17199.933
      16169.908
      13370.486
      10830.214
      5921.897
      1951.320
      378.336
    
    
      4004
      United States of America
      2055
      22982.859
      22923.496
      23078.333
      23425.884
      23815.944
      24297.934
      24590.105
      24300.018
      ...
      22440.395
      23711.495
      21193.980
      19616.753
      15755.924
      13848.389
      10049.982
      6392.556
      2404.179
      491.757
    
    
      4005
      United States of America
      2060
      23290.072
      23341.540
      23440.301
      23668.904
      24008.606
      24351.160
      24732.489
      24882.581
      ...
      23075.531
      22052.379
      23040.439
      20201.178
      18060.205
      13598.188
      10541.782
      6046.506
      2668.961
      629.703
    
    
      4006
      United States of America
      2065
      23447.870
      23628.698
      23829.611
      23998.234
      24219.935
      24515.369
      24763.495
      25010.643
      ...
      23690.927
      22707.067
      21477.637
      22025.299
      18683.209
      15700.595
      10470.067
      6467.202
      2592.976
      739.299
    
    
      4007
      United States of America
      2070
      23582.198
      23766.396
      24087.962
      24354.682
      24517.172
      24697.793
      24904.453
      25026.889
      ...
      23236.150
      23339.717
      22154.432
      20590.134
      20454.438
      16346.388
      12219.141
      6536.323
      2850.574
      768.981
    
    
      4008
      United States of America
      2075
      23743.770
      23880.759
      24196.904
      24580.256
      24841.565
      24965.705
      25063.648
      25152.632
      ...
      24201.687
      22920.892
      22811.873
      21297.240
      19205.735
      18017.834
      12861.201
      7773.575
      2962.732
      863.134
    
    
      4009
      United States of America
      2080
      23937.002
      24022.243
      24282.637
      24656.496
      25035.015
      25260.688
      25307.769
      25296.132
      ...
      24790.711
      23896.587
      22439.779
      21982.602
      19943.070
      17024.742
      14321.414
      8321.331
      3619.650
      940.936
    
    
      4010
      United States of America
      2085
      24118.219
      24195.381
      24395.248
      24709.424
      25079.039
      25424.578
      25578.623
      25524.217
      ...
      24934.398
      24502.189
      23429.731
      21676.019
      20662.528
      17788.334
      13674.425
      9429.389
      3978.514
      1157.841
    
    
      4011
      United States of America
      2090
      24240.914
      24356.517
      24539.629
      24789.222
      25099.654
      25438.981
      25718.488
      25778.570
      ...
      24965.667
      24666.681
      24056.100
      22679.808
      20443.939
      18532.421
      14425.317
      9155.596
      4625.269
      1343.917
    
    
      4012
      United States of America
      2095
      24285.918
      24458.913
      24671.896
      24900.668
      25146.882
      25429.876
      25708.680
      25902.063
      ...
      25100.347
      24718.768
      24249.547
      23333.826
      21461.262
      18436.410
      15170.988
      9823.808
      4615.931
      1613.198
    
    
      4013
      United States of America
      2100
      24276.043
      24483.767
      24745.437
      25000.009
      25225.748
      25447.116
      25675.267
      25875.724
      ...
      25250.018
      24871.981
      24331.263
      23567.497
      22149.821
      19454.976
      15228.571
      10498.245
      5085.079
      1739.648
    
  

64 rows × 23 columns



In [23]:

    
pop_i = pop_US_J.set_index(['Country', 'Year'])



In [24]:

    
pop_i.index









    Out[24]:





MultiIndex(levels=[['Japan', 'United States of America'], [1950, 1955, 1960, 1965, 1970, 1975, 1980, 1985, 1990, 1995, 2000, 2005, 2010, 2015, 2020, 2025, 2030, 2035, 2040, 2045, 2050, 2055, 2060, 2065, 2070, 2075, 2080, 2085, 2090, 2095, 2100]],
           labels=[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30]],
           names=['Country', 'Year'])



In [27]:

    
pop_i.columns.name = 'Age'



In [28]:

    
pop_st = pop_i.stack()



In [29]:

    
pop_st.head()









    Out[29]:





Country  Year  Age  
Japan    1950  0-4      10997.606
               5-9       9448.307
               10-14     8634.342
               15-19     8484.002
               20-24     7636.338
dtype: float64



In [30]:

    
fig, ax = plt.subplots(2, 1, figsize=(12, 8))
pop_st.reorder_levels([2, 0, 1])['5-9']['United States of America'].plot(ax=ax[0], kind='line')
pop_st.reorder_levels([2, 0, 1])['5-9']['Japan'].plot(ax=ax[1], kind='line')
plt.show()



In [31]:

    
pop_st.head()









    Out[31]:





Country  Year  Age  
Japan    1950  0-4      10997.606
               5-9       9448.307
               10-14     8634.342
               15-19     8484.002
               20-24     7636.338
dtype: float64



In [38]:

    
pop_st.loc[('Japan', 2100, slice(None))].plot(kind='bar')
plt.show()



In [35]:

    
pop_st.ix[('Japan', slice(None), '0-4')]









    Out[35]:





Year
1950    10997.606
1955     9204.626
1960     7751.169
1965     8105.794
1970     8902.041
1975     9913.633
1980     8425.462
1985     7406.792
1990     6430.434
1995     5970.705
2000     5858.083
2005     5639.483
2010     5393.341
2015     5269.038
2015     5269.038
2020     5082.934
2025     4872.732
2030     4715.896
2035     4610.562
2040     4524.468
2045     4448.702
2050     4365.709
2055     4271.907
2060     4178.608
2065     4098.930
2070     4033.949
2075     3979.032
2080     3917.939
2085     3854.440
2090     3786.347
2095     3720.541
2100     3662.422
dtype: float64

Resources

The Pandas docs are ok, but we prefer the Data Carpentry guide



In [ ]:

	Major area, region, country or area *	Reference date (as of 1 July)	0-4	5-9	10-14	15-19	20-24	25-29	30-34	35-39	...	60-64	65-69	70-74	75-79	80+	80-84	85-89	90-94	95-99	100+
0	WORLD	1950	337431.893	269549.563	260285.973	238628.448	221780.909	194423.919	166936.550	162917.428	...	73422.363	55106.429	37360.188	21997.317	14201.827	NaN	NaN	NaN	NaN	NaN
1	WORLD	1955	402845.030	315054.954	263265.738	254815.285	231891.794	214877.713	187940.577	160385.333	...	76843.159	59321.778	40345.907	23755.192	16158.258	NaN	NaN	NaN	NaN	NaN
2	WORLD	1960	430565.118	380319.072	309276.447	257898.631	248412.663	225956.592	208746.975	181631.886	...	85064.295	62665.125	44018.202	25985.789	18061.218	NaN	NaN	NaN	NaN	NaN
3	WORLD	1965	477798.025	409020.250	372816.602	303891.430	251896.962	242692.181	219978.079	202499.120	...	95085.421	70255.639	47382.415	29457.425	21031.995	NaN	NaN	NaN	NaN	NaN
4	WORLD	1970	522640.874	458298.455	403910.553	367789.079	297557.428	246920.905	237656.753	214329.535	...	107200.639	81022.911	55168.403	32875.945	25340.203	NaN	NaN	NaN	NaN	NaN

	Major area, region, country or area *	Reference date (as of 1 July)	0-4	5-9	10-14	15-19	20-24	25-29	30-34	35-39	...	55-59	60-64	65-69	70-74	75-79	80-84	85-89	90-94	95-99	100+
0	WORLD	2015	670928.185	637448.895	607431.299	590069.337	603508.555	609793.207	551085.426	497755.634	...	339691.467	292726.792	215047.151	153207.026	114652.312	71449.698	37062.196	13388.529	2921.852	450.895
1	WORLD	2020	677599.590	664282.610	634568.409	604322.921	585831.926	599278.263	604808.769	544641.774	...	387339.470	320927.366	267874.842	187441.892	123355.620	81851.457	42700.991	16937.683	4242.605	605.465
2	WORLD	2025	673174.914	671929.973	661684.410	631509.113	600050.925	581681.875	594598.480	598538.429	...	428166.199	367325.750	295011.725	234797.844	152989.596	89822.147	49847.989	20116.420	5538.923	905.182
3	WORLD	2030	672121.017	667926.121	669744.224	658942.513	627343.133	595871.013	577192.328	588723.101	...	450091.908	407564.034	339528.512	260426.171	193066.277	113475.561	56221.915	24041.609	6836.461	1244.851
4	WORLD	2035	678271.589	667353.285	665875.094	667248.100	655034.545	623315.619	591598.185	571825.139	...	456022.842	429343.580	378601.876	301893.804	216249.259	144393.943	72643.007	28074.222	8397.572	1617.797

	Country	Year	0-4	5-9	10-14	15-19	20-24	25-29	30-34	35-39	...	60-64	65-69	70-74	75-79	80+	80-84	85-89	90-94	95-99	100+
1134	Japan	1950	10997.606	9448.307	8634.342	8484.002	7636.338	6089.337	5104.356	4989.952	...	2277.718	1757.865	1271.333	674.869	362.356	NaN	NaN	NaN	NaN	NaN
1144	Japan	2000	5858.083	5992.304	6534.552	7481.678	8394.685	9782.756	8684.281	8046.208	...	7673.926	7024.429	5825.320	4082.687	NaN	2540.718	1476.095	527.205	111.014	10.78

	0-4	5-9	10-14	15-19	20-24	25-29	30-34	35-39	40-44	45-49	...	55-59	60-64	65-69	70-74	75-79	80-84	85-89	90-94	95-99	100+
Year
1950	10997.606	9448.307	8634.342	8484.002	7636.338	6089.337	5104.356	4989.952	4429.172	3959.115	...	2728.668	2277.718	1757.865	1271.333	674.869	362.356	0.000	0.000	0.000	0.000
2000	5858.083	5992.304	6534.552	7481.678	8394.685	9782.756	8684.281	8046.208	7741.920	8928.982	...	8654.066	7673.926	7024.429	5825.320	4082.687	2540.718	1476.095	527.205	111.014	10.780
2050	4365.709	4454.049	4533.887	4644.123	4791.892	4986.758	5222.010	5419.444	5543.959	5727.320	...	6065.917	6630.748	7010.576	7539.984	8225.455	6530.305	4766.516	2968.450	1523.374	441.117
2100	3662.422	3723.514	3791.893	3872.946	3960.466	4046.149	4121.987	4203.284	4294.733	4394.420	...	4561.356	4613.669	4655.102	4687.750	4715.118	4656.733	4296.603	3419.217	2105.037	904.728

	Country	Year	0-4	5-9	10-14	15-19	20-24	25-29	30-34	35-39	...	55-59	60-64	65-69	70-74	75-79	80-84	85-89	90-94	95-99	100+
3996	United States of America	2015	19701.274	20633.332	20642.609	20721.296	23288.826	21990.484	22032.293	19725.338	...	22129.289	18966.991	16028.634	11427.606	8020.824	5937.915	3877.561	1735.717	477.443	71.972
3997	United States of America	2020	20672.623	20095.221	21202.638	21287.870	21339.101	23830.250	22420.925	22298.257	...	22405.536	21326.390	17960.822	14732.673	9977.856	6399.083	4025.248	1964.662	576.103	95.543
3998	United States of America	2025	21282.091	21067.227	20665.959	21849.693	21910.763	21898.651	24263.160	22696.822	...	19949.374	21656.874	20266.623	16594.890	12955.986	8050.796	4400.099	2080.972	669.505	119.579
3999	United States of America	2030	21466.822	21677.522	21638.496	21315.887	22477.288	22477.228	22350.565	24541.701	...	20124.704	19347.657	20659.611	18818.075	14702.378	10560.532	5624.507	2321.690	727.774	143.984
4000	United States of America	2035	21575.393	21863.314	22249.474	22289.550	21950.247	23050.112	22936.418	22649.523	...	19296.169	19572.949	18530.664	19278.754	16788.244	12107.722	7482.957	3035.847	833.460	163.028