Some goals for this exercise:
Thinking about populations of various geographic entities is a good place to start with open data. We can to work with numbers without necessarily involving complicated mathematics. Just addition if we're lucky. We can also think about geographical locations. We can build out from our initial explorations in a systematica manner.
Off the top of your head:
* What do you think is the current world population?
* How many countries are there?
* How many people are there in the USA? Canada? Mexico? Your favorite country?
* What is the minimum number of countries to add up to 50% of the world's population? How about 90%?
Now go answer these questions looking on the web. Find some a source or two or three.
Two open sources we'll consider:
We will study how to parse these data sources in a later exercise, but for this exercise, the data sets have been parsed into JSON format, which is easily loadable in many languages, including Python using the json Python standard library. We'll also use requests.
Let's look first at the Wikipedia source.
In [2]:
# https://gist.github.com/rdhyee/8511607/raw/f16257434352916574473e63612fcea55a0c1b1c/population_of_countries.json
# scraping of https://en.wikipedia.org/w/index.php?title=List_of_countries_by_population_(United_Nations)&oldid=590438477
# read population in
import json
import requests
pop_json_url = "https://gist.github.com/rdhyee/8511607/raw/f16257434352916574473e63612fcea55a0c1b1c/population_of_countries.json"
pop_list= requests.get(pop_json_url).json()
pop_list
Out[2]:
Show how to calculate the total population according to the list in the Wikipedia. (Answer: 7,162,119,434)
In [4]:
total_pop = 0
for i in pop_list:
total_pop += i[2]
total_pop
Out[4]:
Calculate the total population of 196 entities that have a numeric rank. (Answer: 7,145,999,288) BTW, are those entities actually countries?
In [10]:
ranked_pop = 0
for i in pop_list:
if i[0]:
ranked_pop += i[2]
ranked_pop
Out[10]:
Calculate the total population according to The World Factbook: Country Comparison Population (See https://gist.github.com/rdhyee/8530164).
In [13]:
pop_json_url = "https://gist.github.com/rdhyee/8530164/raw/f8e842fe8ccd6e3bc424e3a24e41ef5c38f419e8/world_factbook_poulation.json"
fb_list= requests.get(pop_json_url).json()
fb_total=0
for i in fb_list:
fb_total += i[2]
fb_total
Out[13]:
Now for something more interesting. I'd like for us to get a feel of what it'd be like to pick a person completely at random from the world's population. Say, if you were picking 5 people -- where might these people be from? Of course, you won't be surprised to pick someone from China or India since those countries are so populous. But how likely will it be for someone from the USA to show up?
To the end of answering this question, start thinking about writing a Python generator that will return the name of a country in which the probability of that country being returned is the proportion of the world's population represented by that country.
Work with your neighbors -- we'll come back to this problem in detail in class on Thursday.
In [31]:
import random
def country_generator(inlist, max_pop):
#print max_pop
while 1:
counted_pop = 0
num = random.randint(1, max_pop)
#print num
for i in inlist:
counted_pop += i[2]
if counted_pop >= num:
return i[1]
Out[31]:
In [36]:
rand_countries = []
for i in range(1,100):
rand_countries.append(country_generator(pop_list, total_pop))
rand_countries
c_dict = {}
for c in rand_countries:
if c in c_dict:
c_dict[c] += 1
else:
c_dict[c] = 1
c_dict
Out[36]:
In [ ]: