IHE Python course, 2017

T.N.Olsthoorn, Feb 26, 2017

Working with lists, dicts and sets and their comprehensions

(Excercises, using 2017 world population data, reading data from Excel)

See Excel workbook in this folder

For this we list all files in the folder and check each if it has '.xls' in it and doesn't start with '~', because that means that the file is actually active in the Excel program.


In [70]:
from pprint import pprint
import matplotlib.pyplot as plt

Import the os module (operating system). It has the method listdir(), which gives a list of the files within a directory.

We apply os.listdir() to get a list of all files and immediately filter it to get only the files that have .xls and don't start with '~'


In [71]:
import os

In [72]:
os.listdir('.')


Out[72]:
['.DS_Store',
 '.ipynb_checkpoints',
 'comprehensions.ipynb',
 'dicts.ipynb',
 'tuplesListsSets.ipynb',
 'worldPopulation.ipynb',
 'worldPopulation2017.xlsx']

In [73]:
import os

xls_files = [f for f in os.listdir() if ".xls" in f and not f.startswith('~')]

print(xls_files)


['worldPopulation2017.xlsx']

Next we'll read the data from the Excel workbook


In [74]:
import openpyxl # excel file functionality

wb = openpyxl.load_workbook(xls_files[0])


/Users/Theo/anaconda/lib/python3.5/site-packages/openpyxl/reader/worksheet.py:322: UserWarning: Unknown extension is not supported and will be removed
  warn(msg)

In [75]:
type(wb)


Out[75]:
openpyxl.workbook.workbook.Workbook

In [76]:
wb.get_sheet_names()


Out[76]:
['Population', 'CountriesByContinent', 'missing']

In [77]:
# Show the sheetnames inside the Excel workbook wb
wb.get_sheet_names()


Out[77]:
['Population', 'CountriesByContinent', 'missing']

We can now treat the workbook as a dictionary in which each key is the name of a worksheet


In [78]:
wsPop = wb['Population']

In [79]:
type(wsPop)


Out[79]:
openpyxl.worksheet.worksheet.Worksheet

wsPop is an object. It has a number of methods and properties, which can be accessed by typing the name followed by a dot followed by a tab. To get more information it can be followed by a question mark.

On the mac wsPop.rows is a tuple of tuples. On some other systems it is a generator. It generates the tuples one after the other in a loop like


In [80]:
for r in wsPop.rows:
    pass # do something

but with a generator, one cannot say len(wsPop.rows) or index it like wsPop.rows[3].

However it's easy to first generate a list of tuples with a comprehension:


In [81]:
data = [r for r in wsPop.rows]

and then continue to work with data instead of wsPop.rows. Therefore, if you have trouble indexing wsPop.rows, first generate a list as shown above and use that wherever you see wsPop.rows below.


In [82]:
cel = wsPop.rows[0][3]  # may not work if wsPop.rows is a generator on your system and not a tuple of tuple.
cel = data[0][3]  # this is a list of tuples, generated above.

In [83]:
cel.value


Out[83]:
'Yearly'

We gen get the contents of sheet as a series of rows using the attribute rows,

and then show an arbitrary value


In [85]:
print( type(wsPop.rows) )  # shows that the wsPop generates a tuple of tuples

print( len(wsPop.rows) ) # shows the number of rows

print( wsPop.rows[3][1] ) # shows some cell

wsPop.rows[3][1].value # shows the value of some cell


<class 'tuple'>
235
<Cell Population.B4>
Out[85]:
'India'

In [86]:
for r in wsPop.rows:
    print(r[1].value)


Country (or dependency)
None
China
India
U.S.
Indonesia
Brazil
Pakistan
Nigeria
Bangladesh
Russia
Mexico
Japan
Ethiopia
Philippines
Viet Nam
Egypt
DR Congo
Iran
Germany
Turkey
Thailand
U.K.
France
Italy
Tanzania
South Africa
Myanmar
South Korea
Colombia
Kenya
Spain
Ukraine
Argentina
Sudan
Uganda
Algeria
Iraq
Poland
Canada
Morocco
Afghanistan
Saudi Arabia
Peru
Venezuela
Malaysia
Uzbekistan
Mozambique
Nepal
Ghana
Yemen
Angola
Madagascar
North Korea
Australia
Cameroon
Côte d'Ivoire
Taiwan
Niger
Sri Lanka
Romania
Burkina Faso
Syria
Mali
Chile
Malawi
Kazakhstan
Zambia
Netherlands
Guatemala
Ecuador
Zimbabwe
Cambodia
Senegal
Chad
Guinea
South Sudan
Rwanda
Burundi
Tunisia
Benin
Belgium
Somalia
Cuba
Bolivia
Haiti
Greece
Dominican Republic
Czech Republic
Portugal
Azerbaijan
Sweden
Hungary
Belarus
United Arab Emirates
Tajikistan
Serbia
Austria
Switzerland
Israel
Honduras
Papua New Guinea
Jordan
Togo
Hong Kong
Bulgaria
Laos
Paraguay
Sierra Leone
Libya
Nicaragua
El Salvador
Kyrgyzstan
Lebanon
Singapore
Denmark
Finland
Turkmenistan
Eritrea
Slovakia
Norway
Central African Republic
State of Palestine
Costa Rica
Congo
Ireland
Oman
Liberia
New Zealand
Mauritania
Croatia
Kuwait
Moldova
Panama
Georgia
Bosnia and Herzegovina
Puerto Rico
Uruguay
Mongolia
Armenia
Albania
Lithuania
Jamaica
Namibia
Botswana
Qatar
Lesotho
Gambia
TFYR Macedonia
Slovenia
Latvia
Guinea-Bissau
Gabon
Bahrain
Trinidad and Tobago
Swaziland
Estonia
Mauritius
Timor-Leste
Cyprus
Djibouti
Fiji
Equatorial Guinea
Réunion
Comoros
Bhutan
Guyana
Montenegro
Macao
Solomon Islands
Western Sahara
Luxembourg
Suriname
Cabo Verde
Guadeloupe
Brunei
Malta
Bahamas
Martinique
Maldives
Belize
Iceland
French Polynesia
Barbados
French Guiana
Vanuatu
New Caledonia
Mayotte
Sao Tome and Principe
Samoa
Saint Lucia
Guam
Channel Islands
Curaçao
Kiribati
St. Vincent & Grenadines
Grenada
Tonga
United States Virgin Islands
Micronesia
Aruba
Seychelles
Antigua and Barbuda
Isle of Man
Dominica
Andorra
Cayman Islands
Bermuda
Saint Kitts and Nevis
Greenland
American Samoa
Northern Mariana Islands
Marshall Islands
Faeroe Islands
Sint Maarten
Liechtenstein
Monaco
Turks and Caicos Islands
Gibraltar
San Marino
British Virgin Islands
Caribbean Netherlands
Palau
Cook Islands
Anguilla
Wallis and Futuna
Nauru
Tuvalu
Saint Pierre and Miquelon
Montserrat
Saint Helena
Falkland Islands
Niue
Tokelau
Holy See

Turn the rows propery into a list of rows, with each row the values of the cells in them.

We can do that in a list comprehension, in this case a list comprehension around a list comprehension.

For each row we turn it into a list of the values of each cell in that row. We do that for each row.

The result is a list of lists. This is done in one line:


In [87]:
data = [[c.value for c in r ]   for r in wsPop.rows]

In [88]:
data[15][1]


Out[88]:
'Viet Nam'

In [89]:
# Show the first 5 lines
for i in range(5):
    print(data[i])


['#', 'Country (or dependency)', 'Population', 'Yearly', 'Net', 'Density', 'Land Area', 'Migrants', 'Fert.', 'Med.', 'Urban']
[None, None, -2017, 'Change', 'Change', '(P/Km²)', '(Km²)', '(net)', 'Rate', 'Age', 'Pop %']
[1, 'China', 1388232693, 0.0043, 5909361, 148, 9386293, -360000, 1.55, 37, 0.576]
[2, 'India', 1342512706, 0.0118, 15711130, 452, 2973450, -519644, 2.48, 27, 0.32]
[3, 'U.S.', 326474013, 0.0073, 2355226, 36, 9144930, 1001577, 1.89, 38, 0.821]

To index the data directly, we don't want a list of lists but a dictionary with the country name as key, such that for country the data is kept in a dictionary with keys obtained form the headers in the first two rows of the excel file as is shown above.


In [90]:
hdrs = data[0]
dims = data[1]

print()
print(hdrs)
print()
print(dims)
print()


['#', 'Country (or dependency)', 'Population', 'Yearly', 'Net', 'Density', 'Land Area', 'Migrants', 'Fert.', 'Med.', 'Urban']

[None, None, -2017, 'Change', 'Change', '(P/Km²)', '(Km²)', '(net)', 'Rate', 'Age', 'Pop %']

Now glue together the hdrs and the dimes, and filter out the None texts:

All hdrs are strings, but that is not the case of the dims, where -2017 was turned in a value.

So when guening h and d together to a new string below, we have to use h + str(d).

The combined headers are obtained using a list comprehension, that also removes the text 'None' from the dims wherever it turns up:


In [91]:
hdrs = [(h + str(d)).replace('None','')  for h, d in zip(hdrs, dims)]


pprint(hdrs)


['#',
 'Country (or dependency)',
 'Population-2017',
 'YearlyChange',
 'NetChange',
 'Density(P/Km²)',
 'Land Area(Km²)',
 'Migrants(net)',
 'Fert.Rate',
 'Med.Age',
 'UrbanPop %']

In [92]:
h = data[13][:]

In [93]:
print(h)
name = h.pop(1)
print(name)
print(h)


[12, 'Ethiopia', 104344901, 0.0245, 2491633, 104, 1000430, -12000, 4.59, 19, 0.194]
Ethiopia
[12, 104344901, 0.0245, 2491633, 104, 1000430, -12000, 4.59, 19, 0.194]

We could generate a dict with the country names as keys where each cntry[key] as a list of itmes. We cannot use a dict comprehension here because we need to pop the country value from each row in to get the key for that rows. Hence a for loop is used:


In [94]:
cntry = dict() # empty dict

for i in range(2, len(wsPop.rows)):
    row = [c.value for c in wsPop.rows[i]]
    cname = row.pop(1)
    cntry[cname] = row # entter the key  cname and the value row into the dict cntry

Now we can get the data of any cntry like so:


In [95]:
cntry['Italy']


Out[95]:
[23, 59797978, -0.0001, -3026, 203, 294137, 105654, 1.43, 46, 0.707]

However, this is not smart enough. We can't see what the figure in the list mean. Therefore, we should use a dictionary for the contents of each cntry with the fields as keys.

These fields are now simply obtained by popping the second item from the hdrs list:


In [96]:
print(hdrs)
hdrs.pop(1)
print()
print(hdrs)


['#', 'Country (or dependency)', 'Population-2017', 'YearlyChange', 'NetChange', 'Density(P/Km²)', 'Land Area(Km²)', 'Migrants(net)', 'Fert.Rate', 'Med.Age', 'UrbanPop %']

['#', 'Population-2017', 'YearlyChange', 'NetChange', 'Density(P/Km²)', 'Land Area(Km²)', 'Migrants(net)', 'Fert.Rate', 'Med.Age', 'UrbanPop %']

Now generate the dict again, with the contens of each cntry itself a dict of its fields:


In [97]:
cntry = dict()
for i in range(2, len(wsPop.rows)):  # skip the first two lines, which are headers
    
    row = [c.value for c in wsPop.rows[i]] # turn the Excel row in a list
  
    cname = row.pop(1)      # pop of the country name, which becomes the key
    
    cntry[cname] = {fld: v for fld, v in zip(hdrs, row)}  # use dict comprehension

Now the contents of an arbitrary country looks like this, it's a dict with fields and values.


In [98]:
cntry['Netherlands']['Population-2017']


Out[98]:
17032845

Let's now compute the total population of the world. We do this by summing for each country.


In [99]:
totPop2017 = 0.  # start out with zero

for c in cntry.keys():
    totPop2017 += cntry[c]['Population-2017'] # get the value directly from indexing the field

print('Total population in the world is {:.2f} billion'.format(totPop2017/1e9))


Total population in the world is 7.52 billion

In [100]:
# Now compute the toal populatin in 30 years, using the current growth rates

popWorld = [0 for i in range(1, 31)]
for k in cntry.keys():
    pc  = cntry[k]['Population-2017']   # country population 2017
    try:
        frc = cntry[k]['Fert.Rate'] / 100.  # growth rate fraction
    except:
        # needed in case frc contains `None`
        frc = 0 # simply use 0 in those cases

    # for the country
    popCntry = [pc * (1 + frc)**i for i in range(1, 31) ]
    
    # for the entire world
    popWorld = [pc + pw for pc, pw in zip(popCntry, popWorld)]

for i, pw in enumerate(popWorld):
    yr = 2017 + i + 1
    print('popWorld [{}] = {:5.2f} billion'.format(yr, pw/1e9))


popWorld [2018] =  7.71 billion
popWorld [2019] =  7.91 billion
popWorld [2020] =  8.11 billion
popWorld [2021] =  8.32 billion
popWorld [2022] =  8.54 billion
popWorld [2023] =  8.77 billion
popWorld [2024] =  9.00 billion
popWorld [2025] =  9.24 billion
popWorld [2026] =  9.49 billion
popWorld [2027] =  9.74 billion
popWorld [2028] = 10.01 billion
popWorld [2029] = 10.29 billion
popWorld [2030] = 10.57 billion
popWorld [2031] = 10.87 billion
popWorld [2032] = 11.17 billion
popWorld [2033] = 11.49 billion
popWorld [2034] = 11.81 billion
popWorld [2035] = 12.15 billion
popWorld [2036] = 12.50 billion
popWorld [2037] = 12.87 billion
popWorld [2038] = 13.25 billion
popWorld [2039] = 13.64 billion
popWorld [2040] = 14.04 billion
popWorld [2041] = 14.47 billion
popWorld [2042] = 14.90 billion
popWorld [2043] = 15.36 billion
popWorld [2044] = 15.83 billion
popWorld [2045] = 16.32 billion
popWorld [2046] = 16.83 billion
popWorld [2047] = 17.36 billion

Population per continent.

We don't have the Continent field associated with the countries. But we do have a list of countries and their continents in a second worksheet:


In [101]:
wscont = wb['CountriesByContinent'] # read it

print(wscont)  # shows it's a worksheet object

print(wscont.rows[0])  # shows it's a tuple of two cell objects

wscont.rows[0][0].value, wscont.rows[0][1].value  # turned into a tuple of tow strings


<Worksheet "CountriesByContinent">
(<Cell CountriesByContinent.A1>, <Cell CountriesByContinent.B1>)
Out[101]:
('Africa', 'Algeria')

We can immediately, in a single line, turn this worksheet into a dictionary with name cont that has the country as key and the continent as field (a string).


In [102]:
cont = { v.value : k.value for k, v in wb['CountriesByContinent'] }

pprint(cont)  # notic the key : value pairs separated by commas


{'Afghanistan': 'Asia',
 'Albania': 'Europe',
 'Algeria': 'Africa',
 'Andorra': 'Europe',
 'Angola': 'Africa',
 'Antigua and Barbuda': 'N. America',
 'Argentina': 'S. America',
 'Armenia': 'Europe',
 'Australia': 'Oceania',
 'Austria': 'Europe',
 'Azerbaijan': 'Europe',
 'Bahamas': 'N. America',
 'Bahrain': 'Asia',
 'Bangladesh': 'Asia',
 'Barbados': 'N. America',
 'Belarus': 'Europe',
 'Belgium': 'Europe',
 'Belize': 'N. America',
 'Benin': 'Africa',
 'Bhutan': 'Asia',
 'Bolivia': 'S. America',
 'Bosnia and Herzegovina': 'Europe',
 'Botswana': 'Africa',
 'Brazil': 'S. America',
 'Brunei': 'Asia',
 'Bulgaria': 'Europe',
 'Burkina': 'Africa',
 'Burma (Myanmar)': 'Asia',
 'Burundi': 'Africa',
 'Cambodia': 'Asia',
 'Cameroon': 'Africa',
 'Canada': 'N. America',
 'Cape Verde': 'Africa',
 'Central African Republic': 'Africa',
 'Chad': 'Africa',
 'Chile': 'S. America',
 'China': 'Asia',
 'Colombia': 'S. America',
 'Comoros': 'Africa',
 'Congo': 'Africa',
 'Congo, Democratic Republic of': 'Africa',
 'Costa Rica': 'N. America',
 'Croatia': 'Europe',
 'Cuba': 'N. America',
 'Cyprus': 'Europe',
 'Czech Republic': 'Europe',
 'Denmark': 'Europe',
 'Djibouti': 'Africa',
 'Dominica': 'N. America',
 'Dominican Republic': 'N. America',
 'East Timor': 'Asia',
 'Ecuador': 'S. America',
 'Egypt': 'Africa',
 'El Salvador': 'N. America',
 'Equatorial Guinea': 'Africa',
 'Eritrea': 'Africa',
 'Estonia': 'Europe',
 'Ethiopia': 'Africa',
 'Fiji': 'Oceania',
 'Finland': 'Europe',
 'France': 'Europe',
 'Gabon': 'Africa',
 'Gambia': 'Africa',
 'Georgia': 'Europe',
 'Germany': 'Europe',
 'Ghana': 'Africa',
 'Greece': 'Europe',
 'Grenada': 'N. America',
 'Guatemala': 'N. America',
 'Guinea': 'Africa',
 'Guinea-Bissau': 'Africa',
 'Guyana': 'S. America',
 'Haiti': 'N. America',
 'Honduras': 'N. America',
 'Hungary': 'Europe',
 'Iceland': 'Europe',
 'India': 'Asia',
 'Indonesia': 'Asia',
 'Iran': 'Asia',
 'Iraq': 'Asia',
 'Ireland': 'Europe',
 'Israel': 'Asia',
 'Italy': 'Europe',
 'Ivory Coast': 'Africa',
 'Jamaica': 'N. America',
 'Japan': 'Asia',
 'Jordan': 'Asia',
 'Kazakhstan': 'Asia',
 'Kenya': 'Africa',
 'Kiribati': 'Oceania',
 'Korea, North': 'Asia',
 'Korea, South': 'Asia',
 'Kuwait': 'Asia',
 'Kyrgyzstan': 'Asia',
 'Laos': 'Asia',
 'Latvia': 'Europe',
 'Lebanon': 'Asia',
 'Lesotho': 'Africa',
 'Liberia': 'Africa',
 'Libya': 'Africa',
 'Liechtenstein': 'Europe',
 'Lithuania': 'Europe',
 'Luxembourg': 'Europe',
 'Macedonia': 'Europe',
 'Madagascar': 'Africa',
 'Malawi': 'Africa',
 'Malaysia': 'Asia',
 'Maldives': 'Asia',
 'Mali': 'Africa',
 'Malta': 'Europe',
 'Marshall Islands': 'Oceania',
 'Mauritania': 'Africa',
 'Mauritius': 'Africa',
 'Mexico': 'N. America',
 'Micronesia': 'Oceania',
 'Moldova': 'Europe',
 'Monaco': 'Europe',
 'Mongolia': 'Asia',
 'Montenegro': 'Europe',
 'Morocco': 'Africa',
 'Mozambique': 'Africa',
 'Namibia': 'Africa',
 'Nauru': 'Oceania',
 'Nepal': 'Asia',
 'Netherlands': 'Europe',
 'New Zealand': 'Oceania',
 'Nicaragua': 'N. America',
 'Niger': 'Africa',
 'Nigeria': 'Africa',
 'Norway': 'Europe',
 'Oman': 'Asia',
 'Pakistan': 'Asia',
 'Palau': 'Oceania',
 'Panama': 'N. America',
 'Papua New Guinea': 'Oceania',
 'Paraguay': 'S. America',
 'Peru': 'S. America',
 'Philippines': 'Asia',
 'Poland': 'Europe',
 'Portugal': 'Europe',
 'Qatar': 'Asia',
 'Romania': 'Europe',
 'Russian Federation': 'Asia',
 'Rwanda': 'Africa',
 'Saint Kitts and Nevis': 'N. America',
 'Saint Lucia': 'N. America',
 'Saint Vincent and the Grenadines': 'N. America',
 'Samoa': 'Oceania',
 'San Marino': 'Europe',
 'Sao Tome and Principe': 'Africa',
 'Saudi Arabia': 'Asia',
 'Senegal': 'Africa',
 'Serbia': 'Europe',
 'Seychelles': 'Africa',
 'Sierra Leone': 'Africa',
 'Singapore': 'Asia',
 'Slovakia': 'Europe',
 'Slovenia': 'Europe',
 'Solomon Islands': 'Oceania',
 'Somalia': 'Africa',
 'South Africa': 'Africa',
 'South Sudan': 'Africa',
 'Spain': 'Europe',
 'Sri Lanka': 'Asia',
 'Sudan': 'Africa',
 'Suriname': 'S. America',
 'Swaziland': 'Africa',
 'Sweden': 'Europe',
 'Switzerland': 'Europe',
 'Syria': 'Asia',
 'Tajikistan': 'Asia',
 'Tanzania': 'Africa',
 'Thailand': 'Asia',
 'Togo': 'Africa',
 'Tonga': 'Oceania',
 'Trinidad and Tobago': 'N. America',
 'Tunisia': 'Africa',
 'Turkey': 'Asia',
 'Turkmenistan': 'Asia',
 'Tuvalu': 'Oceania',
 'Uganda': 'Africa',
 'Ukraine': 'Europe',
 'United Arab Emirates': 'Asia',
 'United Kingdom': 'Europe',
 'United States': 'N. America',
 'Uruguay': 'S. America',
 'Uzbekistan': 'Asia',
 'Vanuatu': 'Oceania',
 'Vatican City': 'Europe',
 'Venezuela': 'S. America',
 'Vietnam': 'Asia',
 'Yemen': 'Asia',
 'Zambia': 'Africa',
 'Zimbabwe': 'Africa'}

Show for a few countries in which continent they are:


In [103]:
for c in ['Bahamas', 'Spain', 'Morocco', 'Honduras', 'Cambodia']:
    print('{:20} lies in {}'.format(c, cont[c]))


Bahamas              lies in N. America
Spain                lies in Europe
Morocco              lies in Africa
Honduras             lies in N. America
Cambodia             lies in Asia

Adding the field continent to the cntry dict

We like to add the field Continent to our cntry dict using the cont dict.

This would be easy if the country names in both dicts would be exactly the same.

Let's see if this is the case.

We can check this using set logic.

Convert the keys of he cntry dict to a set and do the same with those of the cont dict

First step: What countries are in th cont dict that are not in the cntry dict?


In [104]:
df_cont_cntry = set(cont.keys()) - set(cntry.keys())
pprint(df_cont_cntry)

print("\n{} countries are in dict `cont` that are not in dict `cntry`".format(len(df_cont_cntry)))


{'Burkina',
 'Burma (Myanmar)',
 'Cape Verde',
 'Congo, Democratic Republic of',
 'East Timor',
 'Ivory Coast',
 'Korea, North',
 'Korea, South',
 'Macedonia',
 'Russian Federation',
 'Saint Vincent and the Grenadines',
 'United Kingdom',
 'United States',
 'Vatican City',
 'Vietnam'}

15 countries are in dict `cont` that are not in dict `cntry`

And likewise: which are in the cntry dict that are not in the cont dict?


In [105]:
df_cntry_cont = set(cntry.keys()) - set(cont.keys())
pprint(df_cntry_cont)

print("\n{} countries are in dict `cntry` that are not in dict `cont`".format(len(df_cntry_cont)))


{'American Samoa',
 'Anguilla',
 'Aruba',
 'Bermuda',
 'British Virgin Islands',
 'Burkina Faso',
 'Cabo Verde',
 'Caribbean Netherlands',
 'Cayman Islands',
 'Channel Islands',
 'Cook Islands',
 'Curaçao',
 "Côte d'Ivoire",
 'DR Congo',
 'Faeroe Islands',
 'Falkland Islands',
 'French Guiana',
 'French Polynesia',
 'Gibraltar',
 'Greenland',
 'Guadeloupe',
 'Guam',
 'Holy See',
 'Hong Kong',
 'Isle of Man',
 'Macao',
 'Martinique',
 'Mayotte',
 'Montserrat',
 'Myanmar',
 'New Caledonia',
 'Niue',
 'North Korea',
 'Northern Mariana Islands',
 'Puerto Rico',
 'Russia',
 'Réunion',
 'Saint Helena',
 'Saint Pierre and Miquelon',
 'Sint Maarten',
 'South Korea',
 'St. Vincent & Grenadines',
 'State of Palestine',
 'TFYR Macedonia',
 'Taiwan',
 'Timor-Leste',
 'Tokelau',
 'Turks and Caicos Islands',
 'U.K.',
 'U.S.',
 'United States Virgin Islands',
 'Viet Nam',
 'Wallis and Futuna',
 'Western Sahara'}

54 countries are in dict `cntry` that are not in dict `cont`

The set of contries that are in cont but not in cntry obviously have differently spelled names. Probably the easiest way is to take the names from cntry that are not in cont and look up the continent in which each of these contries lies, and use that to supplement the missing countries. We don't then have to worry about the misspelled names.

Using a list of missing names with their continent to complete the cntry dict

This list of missing contries with their continent is in sheet missed of our workbook.


In [106]:
# notice that the columsns in this excel sheet are in columns 2 and 3 and not in 1 and 2 !!
# We construct the dict in one line, using a single dict comprehension

missing = {rw[2].value : rw[1].value for rw in wb['missing']}  # don't need .rows

pprint(missing)


{None: None,
 'American Samoa': 'N. America',
 'Anguilla': 'N. America',
 'Aruba': 'S. America',
 'Bermuda': 'S. America',
 'British Virgin Islands': 'N. America',
 'Burkina Faso': 'Africa',
 'Cabo Verde': 'Africa',
 'Caribbean Netherlands': 'N. America',
 'Cayman Islands': 'N. America',
 'Channel Islands': 'Europe',
 'Cook Islands': 'Oceania',
 'Curaçao': 'S. America',
 "Côte d'Ivoire": 'Africa',
 'DR Congo': 'Africa',
 'Faeroe Islands': 'Europe',
 'Falkland Islands': 'S. America',
 'French Guiana': 'S. America',
 'French Polynesia': 'Oceania',
 'Gibraltar': 'Europe',
 'Greenland': 'N. America',
 'Guadeloupe': 'N. America',
 'Guam': 'Oceania',
 'Holy See': 'Africa',
 'Hong Kong': 'Africa',
 'Isle of Man': 'Europe',
 'Macao': 'Asia',
 'Martinique': 'S. America',
 'Mayotte': 'Africa',
 'Montserrat': 'N. America',
 'Myanmar': 'Asia',
 'New Caledonia': 'Oceania',
 'Niue': 'Oceania',
 'North Korea': 'Asia',
 'Northern Mariana Islands': 'Oceania',
 'Puerto Rico': 'N. America',
 'Russia': 'Europe',
 'Réunion': 'Oceania',
 'Saint Helena': 'Africa',
 'Saint Pierre and Miquelon': 'N. America',
 'Sint Maarten': 'S. America',
 'South Korea': 'Africa',
 'St. Vincent & Grenadines': 'S. America',
 'State of Palestine': 'Asia',
 'TFYR Macedonia': 'Europe',
 'Taiwan': 'Asia',
 'Timor-Leste': 'Asia',
 'Tokelau': 'Oceania',
 'Turks and Caicos Islands': 'N. America',
 'U.K.': 'Europe',
 'U.S.': 'N. America',
 'United States Virgin Islands': 'N. America',
 'Viet Nam': 'Asia',
 'Wallis and Futuna': 'Oceania',
 'Western Sahara': 'Africa'}

Now we can complete our cntry dict with a Continent attribute for every country in it by using the dict cont and the dict missing.

Let's just join them, like so:


In [107]:
for k in missing.keys():
    cont[k] = missing[k]

And then add the continent to the cntry dict


In [108]:
for k in cntry.keys():
    cntry[k]['Continent'] = cont[k]

Now we can print the country name with its continent next to it:


In [109]:
for k in cntry.keys():
    print("{:30} lies in {}".format(k, cntry[k]['Continent']))


Timor-Leste                    lies in Asia
Solomon Islands                lies in Oceania
Liechtenstein                  lies in Europe
Swaziland                      lies in Africa
Macao                          lies in Asia
Angola                         lies in Africa
Turks and Caicos Islands       lies in N. America
Sudan                          lies in Africa
Falkland Islands               lies in S. America
Saint Kitts and Nevis          lies in N. America
Uganda                         lies in Africa
Laos                           lies in Asia
Cambodia                       lies in Asia
Mauritania                     lies in Africa
Belgium                        lies in Europe
Canada                         lies in N. America
Wallis and Futuna              lies in Oceania
Guam                           lies in Oceania
Curaçao                        lies in S. America
Azerbaijan                     lies in Europe
Singapore                      lies in Asia
Niger                          lies in Africa
Somalia                        lies in Africa
Philippines                    lies in Asia
Guyana                         lies in S. America
Ukraine                        lies in Europe
Northern Mariana Islands       lies in Oceania
Anguilla                       lies in N. America
Bhutan                         lies in Asia
Maldives                       lies in Asia
Panama                         lies in N. America
St. Vincent & Grenadines       lies in S. America
Equatorial Guinea              lies in Africa
Haiti                          lies in N. America
Andorra                        lies in Europe
Togo                           lies in Africa
U.K.                           lies in Europe
Qatar                          lies in Asia
Bangladesh                     lies in Asia
Costa Rica                     lies in N. America
Isle of Man                    lies in Europe
Sierra Leone                   lies in Africa
Ethiopia                       lies in Africa
Viet Nam                       lies in Asia
Seychelles                     lies in Africa
Croatia                        lies in Europe
Comoros                        lies in Africa
Puerto Rico                    lies in N. America
Uruguay                        lies in S. America
North Korea                    lies in Asia
American Samoa                 lies in N. America
Antigua and Barbuda            lies in N. America
Belize                         lies in N. America
Vanuatu                        lies in Oceania
Aruba                          lies in S. America
Chile                          lies in S. America
Saint Pierre and Miquelon      lies in N. America
Tonga                          lies in Oceania
Japan                          lies in Asia
Lithuania                      lies in Europe
Barbados                       lies in N. America
Kazakhstan                     lies in Asia
Guadeloupe                     lies in N. America
El Salvador                    lies in N. America
Ireland                        lies in Europe
Kenya                          lies in Africa
Afghanistan                    lies in Asia
Argentina                      lies in S. America
Brazil                         lies in S. America
Paraguay                       lies in S. America
Russia                         lies in Europe
Iran                           lies in Asia
Yemen                          lies in Asia
Congo                          lies in Africa
Malaysia                       lies in Asia
Eritrea                        lies in Africa
Denmark                        lies in Europe
Channel Islands                lies in Europe
Trinidad and Tobago            lies in N. America
Georgia                        lies in Europe
India                          lies in Asia
Liberia                        lies in Africa
Mozambique                     lies in Africa
Saudi Arabia                   lies in Asia
Ghana                          lies in Africa
Ecuador                        lies in S. America
Egypt                          lies in Africa
Hong Kong                      lies in Africa
Gambia                         lies in Africa
San Marino                     lies in Europe
Micronesia                     lies in Oceania
Chad                           lies in Africa
Dominica                       lies in N. America
Tokelau                        lies in Oceania
Monaco                         lies in Europe
Algeria                        lies in Africa
Central African Republic       lies in Africa
Slovakia                       lies in Europe
Thailand                       lies in Asia
Hungary                        lies in Europe
Zambia                         lies in Africa
Armenia                        lies in Europe
Guatemala                      lies in N. America
Nigeria                        lies in Africa
Cook Islands                   lies in Oceania
Caribbean Netherlands          lies in N. America
Iraq                           lies in Asia
Suriname                       lies in S. America
Sri Lanka                      lies in Asia
Peru                           lies in S. America
Libya                          lies in Africa
Tanzania                       lies in Africa
Switzerland                    lies in Europe
Mongolia                       lies in Asia
Bulgaria                       lies in Europe
Bahamas                        lies in N. America
Benin                          lies in Africa
Spain                          lies in Europe
Albania                        lies in Europe
Montenegro                     lies in Europe
Djibouti                       lies in Africa
Samoa                          lies in Oceania
Lesotho                        lies in Africa
DR Congo                       lies in Africa
Finland                        lies in Europe
Sweden                         lies in Europe
Uzbekistan                     lies in Asia
Mayotte                        lies in Africa
Guinea                         lies in Africa
Netherlands                    lies in Europe
Bermuda                        lies in S. America
Bahrain                        lies in Asia
Moldova                        lies in Europe
Cameroon                       lies in Africa
Lebanon                        lies in Asia
Romania                        lies in Europe
Venezuela                      lies in S. America
South Sudan                    lies in Africa
TFYR Macedonia                 lies in Europe
Côte d'Ivoire                  lies in Africa
Israel                         lies in Asia
Namibia                        lies in Africa
China                          lies in Asia
Cyprus                         lies in Europe
South Africa                   lies in Africa
Nepal                          lies in Asia
Mali                           lies in Africa
State of Palestine             lies in Asia
Norway                         lies in Europe
Kiribati                       lies in Oceania
Botswana                       lies in Africa
Saint Lucia                    lies in N. America
Tunisia                        lies in Africa
Nauru                          lies in Oceania
Australia                      lies in Oceania
Burkina Faso                   lies in Africa
Cabo Verde                     lies in Africa
Portugal                       lies in Europe
Fiji                           lies in Oceania
Serbia                         lies in Europe
Western Sahara                 lies in Africa
Latvia                         lies in Europe
Burundi                        lies in Africa
Saint Helena                   lies in Africa
Slovenia                       lies in Europe
Cayman Islands                 lies in N. America
Sint Maarten                   lies in S. America
Colombia                       lies in S. America
Kuwait                         lies in Asia
Jordan                         lies in Asia
French Guiana                  lies in S. America
Madagascar                     lies in Africa
Niue                           lies in Oceania
Grenada                        lies in N. America
Marshall Islands               lies in Oceania
Italy                          lies in Europe
Faeroe Islands                 lies in Europe
Taiwan                         lies in Asia
Martinique                     lies in S. America
New Zealand                    lies in Oceania
Austria                        lies in Europe
Senegal                        lies in Africa
Gibraltar                      lies in Europe
Guinea-Bissau                  lies in Africa
Tuvalu                         lies in Oceania
New Caledonia                  lies in Oceania
French Polynesia               lies in Oceania
Poland                         lies in Europe
Pakistan                       lies in Asia
U.S.                           lies in N. America
Kyrgyzstan                     lies in Asia
Gabon                          lies in Africa
British Virgin Islands         lies in N. America
Bosnia and Herzegovina         lies in Europe
Malta                          lies in Europe
Papua New Guinea               lies in Oceania
Jamaica                        lies in N. America
Estonia                        lies in Europe
Mauritius                      lies in Africa
Rwanda                         lies in Africa
Oman                           lies in Asia
Greece                         lies in Europe
United States Virgin Islands   lies in N. America
Bolivia                        lies in S. America
Sao Tome and Principe          lies in Africa
South Korea                    lies in Africa
France                         lies in Europe
Holy See                       lies in Africa
United Arab Emirates           lies in Asia
Brunei                         lies in Asia
Turkmenistan                   lies in Asia
Indonesia                      lies in Asia
Myanmar                        lies in Asia
Réunion                        lies in Oceania
Iceland                        lies in Europe
Mexico                         lies in N. America
Syria                          lies in Asia
Zimbabwe                       lies in Africa
Palau                          lies in Oceania
Montserrat                     lies in N. America
Tajikistan                     lies in Asia
Dominican Republic             lies in N. America
Luxembourg                     lies in Europe
Honduras                       lies in N. America
Greenland                      lies in N. America
Germany                        lies in Europe
Morocco                        lies in Africa
Cuba                           lies in N. America
Malawi                         lies in Africa
Czech Republic                 lies in Europe
Belarus                        lies in Europe
Turkey                         lies in Asia
Nicaragua                      lies in N. America

Population growth per continent

It is now possible to compute and show the population and its growth per continent

The first step is extract the continents from the dict using set logic, i.e. by set comprehension. The result is a set with the unique values from the Continent field.


In [110]:
continents = { cntry[k]['Continent'] for k in cntry.keys()} # set comprehension

print(continents)


{'Asia', 'Africa', 'Oceania', 'Europe', 'S. America', 'N. America'}

It's then possible to compute the future population be selecting the countries for this continent to do the computation on:


In [111]:
continent='Europe'

# population and fertility rate for the countries of this continent:
popCont =  [(cntry[k]['Population-2017'], cntry[k]['Fert.Rate']) for k in cntry.keys() if cntry[k]['Continent']==continent]

print("The number of countries in {} is {}".format(continent, len(popCont)))


The number of countries in Europe is 51

The last but one step is to compute the population for the coming years in each continent. We make a dictionary pTot with the continent as key and which has the list of future population values as a list:


In [112]:
pTot = dict()  # empty dict
Nyr = 50  # year to predict
print("The predicted population inover the next {} years is:".format(Nyr))

for c in continents:
    pTot[c] = [0 for i in range(Nyr)] # start with empty total for each continent
    
    # generate a list of [Pop, fertility rate] for each country in this continent
    popCont =  [(cntry[k]['Population-2017'], cntry[k]['Fert.Rate'])
                            for k in cntry.keys() if cntry[k]['Continent'] == c]

    # compute the country's future population and add to continentn total
    for p, fr in popCont:
        try:
            p  = float(p)
            fr = float(fr)
            # population of country in coming years
            pcntry = [p * (1 + fr/100.)**i for i in range(1, Nyr + 1)]
            
            # add to continent total
            pTot[c]= [pt + p for pt, p in zip(pTot[c], pcntry)]        
        except:
            # it crashes when fertility rate is 'None', we ignore these frew contries
            pass
        
    print("{:10s}".format(c), end="")  # print continent
    for p in pTot[c]:
           print("{:6.2f}".format(p / 1.0e9), end="") # print Pop values
    print()


The predicted population inover the next 50 years is:
Asia        4.50  4.60  4.70  4.81  4.91  5.02  5.14  5.25  5.37  5.49  5.62  5.74  5.87  6.01  6.15  6.29  6.43  6.58  6.73  6.89  7.05  7.21  7.38  7.55  7.73  7.91  8.10  8.29  8.48  8.69  8.89  9.10  9.32  9.55  9.78 10.01 10.26 10.50 10.76 11.02 11.29 11.57 11.86 12.15 12.45 12.76 13.08 13.41 13.74 14.09
Africa      1.36  1.43  1.50  1.57  1.64  1.72  1.80  1.89  1.98  2.07  2.17  2.28  2.39  2.51  2.63  2.76  2.90  3.04  3.19  3.35  3.52  3.70  3.88  4.08  4.28  4.50  4.73  4.97  5.23  5.50  5.78  6.08  6.39  6.72  7.07  7.44  7.83  8.24  8.68  9.13  9.62 10.12 10.66 11.23 11.83 12.46 13.13 13.83 14.57 15.36
Oceania     0.04  0.04  0.04  0.05  0.05  0.05  0.05  0.05  0.05  0.05  0.05  0.05  0.06  0.06  0.06  0.06  0.06  0.06  0.07  0.07  0.07  0.07  0.07  0.07  0.08  0.08  0.08  0.08  0.08  0.09  0.09  0.09  0.09  0.10  0.10  0.10  0.10  0.11  0.11  0.11  0.11  0.12  0.12  0.12  0.13  0.13  0.13  0.14  0.14  0.15
Europe      0.77  0.78  0.79  0.81  0.82  0.83  0.85  0.86  0.87  0.89  0.90  0.92  0.93  0.95  0.96  0.98  0.99  1.01  1.03  1.04  1.06  1.08  1.09  1.11  1.13  1.15  1.17  1.18  1.20  1.22  1.24  1.26  1.28  1.30  1.33  1.35  1.37  1.39  1.41  1.44  1.46  1.48  1.51  1.53  1.56  1.58  1.61  1.63  1.66  1.69
S. America  0.44  0.45  0.45  0.46  0.47  0.48  0.49  0.50  0.51  0.52  0.54  0.55  0.56  0.57  0.58  0.59  0.61  0.62  0.63  0.64  0.66  0.67  0.68  0.70  0.71  0.73  0.74  0.76  0.78  0.79  0.81  0.82  0.84  0.86  0.88  0.90  0.92  0.93  0.95  0.97  0.99  1.02  1.04  1.06  1.08  1.10  1.13  1.15  1.18  1.20
N. America  0.59  0.61  0.62  0.63  0.65  0.66  0.67  0.69  0.70  0.71  0.73  0.74  0.76  0.78  0.79  0.81  0.82  0.84  0.86  0.88  0.90  0.91  0.93  0.95  0.97  0.99  1.01  1.03  1.06  1.08  1.10  1.12  1.15  1.17  1.20  1.22  1.25  1.27  1.30  1.33  1.35  1.38  1.41  1.44  1.47  1.50  1.53  1.57  1.60  1.63

A better overview of the results would be to have columns, one per continent, and the numbers vertical, with the year as first column. Although the numbers are now not naturally ordered to do this, it can be realized with little effort as follows:


In [113]:
print("The predicted population in billions:")

continents = pTot.keys()

styear = 2018 # starting year
print("{:4s}".format("Year"), end="")
for c in continents:
    print("{:>12}".format(c), end="")

print()
for i in range(Nyr):
    print("{:4d}".format(styear + i), end="")
    for c in continents:
        print("{:12.2f}".format(pTot[c][i]/1e9), end="")
    print()


The predicted population in billions:
Year      Europe  S. America        Asia  N. America      Africa     Oceania
2018        0.77        0.44        4.50        0.59        1.36        0.04
2019        0.78        0.45        4.60        0.61        1.43        0.04
2020        0.79        0.45        4.70        0.62        1.50        0.04
2021        0.81        0.46        4.81        0.63        1.57        0.05
2022        0.82        0.47        4.91        0.65        1.64        0.05
2023        0.83        0.48        5.02        0.66        1.72        0.05
2024        0.85        0.49        5.14        0.67        1.80        0.05
2025        0.86        0.50        5.25        0.69        1.89        0.05
2026        0.87        0.51        5.37        0.70        1.98        0.05
2027        0.89        0.52        5.49        0.71        2.07        0.05
2028        0.90        0.54        5.62        0.73        2.17        0.05
2029        0.92        0.55        5.74        0.74        2.28        0.05
2030        0.93        0.56        5.87        0.76        2.39        0.06
2031        0.95        0.57        6.01        0.78        2.51        0.06
2032        0.96        0.58        6.15        0.79        2.63        0.06
2033        0.98        0.59        6.29        0.81        2.76        0.06
2034        0.99        0.61        6.43        0.82        2.90        0.06
2035        1.01        0.62        6.58        0.84        3.04        0.06
2036        1.03        0.63        6.73        0.86        3.19        0.07
2037        1.04        0.64        6.89        0.88        3.35        0.07
2038        1.06        0.66        7.05        0.90        3.52        0.07
2039        1.08        0.67        7.21        0.91        3.70        0.07
2040        1.09        0.68        7.38        0.93        3.88        0.07
2041        1.11        0.70        7.55        0.95        4.08        0.07
2042        1.13        0.71        7.73        0.97        4.28        0.08
2043        1.15        0.73        7.91        0.99        4.50        0.08
2044        1.17        0.74        8.10        1.01        4.73        0.08
2045        1.18        0.76        8.29        1.03        4.97        0.08
2046        1.20        0.78        8.48        1.06        5.23        0.08
2047        1.22        0.79        8.69        1.08        5.50        0.09
2048        1.24        0.81        8.89        1.10        5.78        0.09
2049        1.26        0.82        9.10        1.12        6.08        0.09
2050        1.28        0.84        9.32        1.15        6.39        0.09
2051        1.30        0.86        9.55        1.17        6.72        0.10
2052        1.33        0.88        9.78        1.20        7.07        0.10
2053        1.35        0.90       10.01        1.22        7.44        0.10
2054        1.37        0.92       10.26        1.25        7.83        0.10
2055        1.39        0.93       10.50        1.27        8.24        0.11
2056        1.41        0.95       10.76        1.30        8.68        0.11
2057        1.44        0.97       11.02        1.33        9.13        0.11
2058        1.46        0.99       11.29        1.35        9.62        0.11
2059        1.48        1.02       11.57        1.38       10.12        0.12
2060        1.51        1.04       11.86        1.41       10.66        0.12
2061        1.53        1.06       12.15        1.44       11.23        0.12
2062        1.56        1.08       12.45        1.47       11.83        0.13
2063        1.58        1.10       12.76        1.50       12.46        0.13
2064        1.61        1.13       13.08        1.53       13.13        0.13
2065        1.63        1.15       13.41        1.57       13.83        0.14
2066        1.66        1.18       13.74        1.60       14.57        0.14
2067        1.69        1.20       14.09        1.63       15.36        0.15

Finally, make a plot of the growth curves:


In [114]:
years = [2018 + i for i in range(Nyr)]
for c in continents:
    plt.plot(years, pTot[c], label=c)
plt.xlabel('year')
plt.ylabel('Population [billions]')
plt.title('Population development')
plt.legend(loc='best', fontsize='x-small')
plt.show()



In [116]:
#plt.pie?

We can also put the year results in a dict that has as key the year and as values a dict with the continents.


In [117]:
contPop = dict()
for yr in range(Nyr):
    contPop[2018 + yr] = { c : v for c, v in zip(continents, [pTot[c][yr] for c in continents])}

# show it
pprint(contPop[2025])


{'Africa': 1887126768.7242863,
 'Asia': 5251901043.984403,
 'Europe': 859888151.1738987,
 'N. America': 686011271.3401413,
 'Oceania': 49759904.37036097,
 'S. America': 503160315.91965014}

Then finally we could make pie charts for the population at say 4 points in time.


In [118]:
import numpy as np

fig, axs = plt.subplots(2,2, sharex=True, sharey=True)
axs = axs.ravel()

for ax, yr in zip(axs.ravel(), [2020, 2030, 2040,2050]):
    ax.set_title(str(yr))  
    x = np.array(list(contPop[yr].values()))/1.0e9
    y = list(contPop[yr].keys())
    #print(x)
    #print(y)
    r = np.sqrt(np.sum(x))/4
    ax.xlim = []
    ax.pie(x, labels=y, radius=r)
plt.show()



In [ ]: