In [34]:
import numpy as np
import pandas as pd

%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
from IPython import display

from collections import namedtuple

Dynamic programming solves a problem by breaking it up into smaller sub-problems, looks up the answer or solves and saves results and then combines the answers together to solve the original problem.

Key question:

  • What are the sub problems?

A simple example:

Fib numbers using dynamic programming


In [153]:
def fib_recursion(n, table=[]):
    """returns fib of n"""
    
    if n <= 1:
        return n
    
    if len(table) < 1:
        table = np.zeros(n+1)
    if table[n-1] == 0:
        table[n-1] = fib_recursion(n-1, table)
    if table[n-2] == 0:
        table[n-2] = fib_recursion(n-2, table)
    
    table[n] = table[n-1] + table[n-2]
    
    return table[n]

[fib_recursion(n) for n in range(10)]


Out[153]:
[0, 1, 1.0, 2.0, 3.0, 5.0, 8.0, 13.0, 21.0, 34.0]

Now what if we use a list:'


In [150]:
def fib_list(n):
    table=[0,1]
    for i in range(2, n+1):
        table.append(table[i-1]+table[i-2])
    return table[n]
        
[fib_list(n) for n in range(10)]


Out[150]:
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34]

Now there isn't any point in saving all the fib numbers, so modifying the code to save only the last two fib numbers:


In [159]:
def fib_list_two(n):
    table=[0,1]
    
    for i in range(2, n+1):
        table.append(sum(table[-2:]))
        del table[0] 
    return table[1]
        
[fib_list_two(n) for n in range(10)]


Out[159]:
[1, 1, 1, 2, 3, 5, 8, 13, 21, 34]

In [167]:
%time fib_recursion(100)


CPU times: user 0 ns, sys: 0 ns, total: 0 ns
Wall time: 455 µs
Out[167]:
3.54224848179262e+20

The recursion fails as exceeds max memory depth on pretty small fib numbers


In [162]:
%time fib_list(1000)


CPU times: user 0 ns, sys: 0 ns, total: 0 ns
Wall time: 525 µs
Out[162]:
43466557686937456435688527675040625802564660517371780402481729089536555417949051890403879840079255169295922593080322634775209689623239873322471161642996440906533187938298969649928516003704476137795166849228875

Now this holds all the fib numbers up to n in memory in a list. Which for large n seems un necessary to waste all that memory.


In [163]:
%time fib_list_two(1000)


CPU times: user 0 ns, sys: 0 ns, total: 0 ns
Wall time: 1.17 ms
Out[163]:
43466557686937456435688527675040625802564660517371780402481729089536555417949051890403879840079255169295922593080322634775209689623239873322471161642996440906533187938298969649928516003704476137795166849228875

Knapsack problem


In [7]:
income_url="http://docs.google.com/spreadsheet/pub?key=0AkBd6lyS3EmpdHo5S0J6ekhVOF9QaVhod05QSGV4T3c&output=xlsx"
life_url="http://docs.google.com/spreadsheet/pub?key=phAwcNAVuyj2tPLxKvvnNPA&output=xlsx"
pop_url="http://docs.google.com/spreadsheet/pub?key=phAwcNAVuyj0XOoBL_n5tAQ&output=xlsx"

In [8]:
income = pd.read_excel(income_url)
life = pd.read_excel(life_url)
pop = pd.read_excel(pop_url)

In [9]:
income.head()


Out[9]:
Income per person (fixed 2000 US$) 1960 1961 1962 1963 1964 1965 1966 1967 1968 ... 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011
0 Abkhazia NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 Afghanistan NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 Akrotiri and Dhekelia NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 Albania NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 1313.722725 1381.040832 1454.022854 1525.723589 1594.495067 1681.613910 1804.419415 1857.352947 1915.424459 1965.707230
4 Algeria 1280.384828 1085.414612 855.947986 1128.41578 1170.323896 1215.015783 1127.614288 1200.558225 1291.863983 ... 1871.921986 1971.512803 2043.135713 2115.186028 2124.957754 2155.485231 2173.787903 2192.703976 2231.980246 2255.225482

5 rows × 53 columns


In [10]:
life.head()


Out[10]:
Life expectancy 1800 1801 1802 1803 1804 1805 1806 1807 1808 ... 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
0 Abkhazia NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 Afghanistan 28.21 28.20 28.19 28.18 28.17 28.16 28.15 28.14 28.13 ... 52.4 52.8 53.3 53.6 54.0 54.4 54.8 54.9 53.8 52.72
2 Akrotiri and Dhekelia NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 Albania 35.40 35.40 35.40 35.40 35.40 35.40 35.40 35.40 35.40 ... 76.6 76.8 77.0 77.2 77.4 77.5 77.7 77.9 78.0 78.10
4 Algeria 28.82 28.82 28.82 28.82 28.82 28.82 28.82 28.82 28.82 ... 75.3 75.5 75.7 76.0 76.1 76.2 76.3 76.3 76.4 76.50

5 rows × 218 columns


In [11]:
pop.head()


Out[11]:
Total population 1800 1810 1820 1830 1840 1850 1860 1870 1880 ... 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
0 Abkhazia NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 Afghanistan 3280000.0 3280000.0 3323519.0 3448982.0 3625022.0 3810047.0 3973968.0 4169690.0 4419695.0 ... 25183615.0 25877544.0 26528741.0 27207291.0 27962207.0 28809167.0 29726803.0 30682500.0 31627506.0 32526562.0
2 Akrotiri and Dhekelia NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 15700.0 15700.0 15700.0 NaN NaN NaN NaN NaN NaN NaN
3 Albania 410445.0 423591.0 438671.0 457234.0 478227.0 506889.0 552800.0 610036.0 672544.0 ... 3050741.0 3010849.0 2968026.0 2929886.0 2901883.0 2886010.0 2880667.0 2883281.0 2889676.0 2896679.0
4 Algeria 2503218.0 2595056.0 2713079.0 2880355.0 3082721.0 3299305.0 3536468.0 3811028.0 4143163.0 ... 33749328.0 34261971.0 34811059.0 35401790.0 36036159.0 36717132.0 37439427.0 38186135.0 38934334.0 39666519.0

5 rows × 82 columns

The most recent year where we have info on all three is 2011, so lets look at 2011 data:


In [16]:
df = pop[["Total population", 2011]].copy()
df.rename(columns={2011: "Population"}, inplace=True)
df.head()


Out[16]:
Total population Population
0 Abkhazia NaN
1 Afghanistan 28809167.0
2 Akrotiri and Dhekelia NaN
3 Albania 2886010.0
4 Algeria 36717132.0

In [17]:
df = df.join(life[2011])
df.rename(columns={2011: "Life"}, inplace=True)
df.head()


Out[17]:
Total population Population Life
0 Abkhazia NaN NaN
1 Afghanistan 28809167.0 54.0
2 Akrotiri and Dhekelia NaN NaN
3 Albania 2886010.0 77.4
4 Algeria 36717132.0 76.1

In [18]:
df = df.join(income["2011"])
df.rename(columns={"2011": "Income"}, inplace=True)
df.head()


Out[18]:
Total population Population Life Income
0 Abkhazia NaN NaN NaN
1 Afghanistan 28809167.0 54.0 NaN
2 Akrotiri and Dhekelia NaN NaN NaN
3 Albania 2886010.0 77.4 1965.707230
4 Algeria 36717132.0 76.1 2255.225482

So now we have a dataframe containing country names, their population, adjusted per capita income and life expectance.

To simplify things, I'm going to drop the missing values:


In [20]:
df.dropna(inplace=True)
df.head()


Out[20]:
Total population Population Life Income
3 Albania 2886010.0 77.4 1965.707230
4 Algeria 36717132.0 76.1 2255.225482
7 Angola 21942296.0 58.1 629.955306
9 Antigua and Barbuda 88152.0 75.9 9977.957073
10 Argentina 41655616.0 76.0 11601.630223

Lets say we want 500 million people with the highest income. So the population becomes the "weights", and the income the "values" of this knapsack problem.

So lets divide this problem into sizes of 10M, so we go 10, 20, 30... all the way to 500M

first, starting with 10 to 50M to keep it simple, heres our table:


In [30]:
solved = pd.DataFrame(columns=np.arange(10,51,10), index=df["Total population"])
solved.fillna(value=0, inplace=True)
solved.head()


Out[30]:
10 20 30 40 50
Total population
Albania 0 0 0 0 0
Algeria 0 0 0 0 0
Angola 0 0 0 0 0
Antigua and Barbuda 0 0 0 0 0
Argentina 0 0 0 0 0

In [41]:
ans = {}
ans[10] = 100, 200, ["countryA", "B"]
ans[10]


Out[41]:
(100, 200, ['countryA', 'B'])

In [42]:
def pop_income(pop_size):
    """takes in a pop_size, and returns a tuple containing
    max income and pop and a list of countries chosen"""
    if pop_size in solved.keys():
        return solved[max]
    pop = 0
    income = 0
    
    for i, row in df[df["Population"]<= pop_size].iterrows():
        print(row)

pop_income(10*np.power(10,4))
np.power(10,9)


Total population    Antigua and Barbuda
Population                        88152
Life                               75.9
Income                          9977.96
Name: 9, dtype: object
Total population    Dominica
Population             71402
Life                    73.4
Income               6518.66
Name: 61, dtype: object
Total population    Marshall Islands
Population                     52541
Life                              66
Income                       2522.82
Name: 139, dtype: object
Total population    Seychelles
Population               93810
Life                      73.4
Income                 9279.11
Name: 202, dtype: object
Out[42]:
1000000000

In [ ]: