Bias on Wikipedia

For this assignment (https://wiki.communitydata.cc/HCDS_(Fall_2017)/Assignments#A2:_Bias_in_data), your job is to analyze what the nature of political articles on Wikipedia - both their existence, and their quality - can tell us about bias in Wikipedia's content.

Getting the article and population data

The first step is to load data files downloaded from different online resources. The data files are:

page_data.csv: Wikipedia political articles data
Population Mid-2015.csv: population data of a variety of countries

Getting the data from page_data.csv file



In [1]:

    
import csv

data = []
revid = []
with open('page_data.csv') as csvfile:
    reader = csv.reader(csvfile)
    for row in reader:
        data.append([row[0],row[1],row[2]])
        revid.append(row[2])
# Remove the first element ('rev_id') from revid so that the list only contains revision IDs.
revid.pop(0)









    Out[1]:





'rev_id'

Getting the data (country and population) from the population file



In [2]:

    
from itertools import islice
import csv

import pandas as pd
population = []
with open('Population Mid-2015.csv') as population_file:
    reader = csv.reader(population_file)
    # note that first row is title; the second and last two rows are blank
    # skip first and last two rows in the csv file
    for row in islice(reader,2,213):
        population.append([row[0],row[4]])

Getting article quality predictions

In this step, we'll get article quality predictions by using ORES API. In order to avoid hitting the limits in ORES, we split all revision IDs into chunks of 50. The response from ORES for each article is in one of 6 categories:

FA - Featured article
GA - Good article
B - B-class article
C - C-class article
Start - Start-class article
Stub - Stub-class article

Split revision IDs into chunks of 50



In [3]:

    
chunks = [revid[x:x+50] for x in range(0, len(revid), 50)]

Write a function to make a request with multiple revision IDs



In [4]:

    
import requests
import json

def get_ores_data(revision_ids, headers):
    
    # Define the endpoint
    endpoint = 'https://ores.wikimedia.org/v3/scores/{project}/?models={model}&revids={revids}'
    
    # Specify the parameters - smushing all the revision IDs together separated by | marks.
    # Yes, 'smush' is a technical term, trust me I'm a scientist.
    # What do you mean "but people trusting scientists regularly goes horribly wrong" who taught you tha- oh.  
    params = {'project' : 'enwiki',
              'model'   : 'wp10',
              'revids'  : '|'.join(str(x) for x in revision_ids)
              }
    api_call = requests.get(endpoint.format(**params))
    response = api_call.json()
    return response

Request the values for prediction (the quality of an article) from ORES API.



In [5]:

    
headers = {'User-Agent' : 'https://github.com/yawen32', 'From' : 'liy44@uw.edu'}
article_quality = []
for i in range(len(chunks)):
    response = get_ores_data(chunks[i],headers)
    aq = response['enwiki']['scores']
    for j in range(len(chunks[i])):
        for key in aq[chunks[i][j]]["wp10"]:
            # Flag the articles have been deleted
            if key == "error":
                article_quality.append("None")
            else:
                article_quality.append(aq[chunks[i][j]]['wp10']['score']['prediction'])

Save prediction values to a file



In [6]:

    
aq = open("article_quality.txt","w")
for item in article_quality:
    aq.write("{}\n".format(item))
aq.close()



In [7]:

    
with open("article_quality.csv","w",newline="") as f:
    aqcsv = csv.writer(f)
    aqcsv.writerow(article_quality)

Read prediction values from the saved file



In [8]:

    
with open('article_quality.txt','r') as f:
    articleQuality = f.read().splitlines()

Combining the datasets

In this step, we'll combine the article quality data, article data and population data together. In addition, the rows without matching data will be removed in the process of combining the data. Write merged data into a single CSV file contains five columns: country, article_name, revision_id, article_quality, population

First, add the ORES data into the Wikipedia data, then merge the Wikipedia data and population data together on the common key value (country).



In [9]:

    
wiki_data = pd.DataFrame(data[1:],columns=data[0])



In [10]:

    
wiki_data
len(pd.Series(articleQuality).values)









    Out[10]:





47197



In [11]:

    
# Add the ORES data into the Wikipedia data 
wiki_data["article_quality"] = pd.Series(articleQuality).values



In [12]:

    
# Rename columns of the Wikipedia data
wiki_data.columns = ["article_name","country","revision_id","article_quality"]



In [13]:

    
# Convert data (country and population) from the population file to dataframe
population_data = pd.DataFrame(population[1:],columns=population[0])



In [14]:

    
# Renames the columns with suitable names
population_data.columns = ["Location","population"]



In [15]:

    
# Merge two datasets(wiki_data and population_data) base on the common key (country name). This step removes the rows do not have
# matching data automatically.
merge_data = pd.merge(wiki_data, population_data, left_on = 'country', right_on = 'Location', how = 'inner')
merge_data = merge_data.drop('Location', axis=1)
# Swap first and second columns so that the dataframe follows the formatting conventions
merge_data = merge_data[["country","article_name","revision_id","article_quality","population"]]

Write merged data to a CSV file



In [16]:

    
merge_data.to_csv("final_data.csv")

Analysis

In this step, we'll analyze merged dataset ("final_data.csv") and understand how the coverage of politicians on Wikipedia and the quality of articles about politicians varies among different countries

Calculate the proportion (as a percentage) of articles-per-population



In [26]:

    
# Extract column "country" from merge data
merge_country = merge_data.iloc[:,0].tolist()



In [27]:

    
# Count the number of articles for each country
from collections import Counter
count_article = Counter(merge_country)



In [28]:

    
prop_article_per_population = []
df_prop_article_per_population = pd.DataFrame(columns=['country', 'population', 'num_articles','prop_article_per_population'])
num_country = 0

for country in count_article:
    population = int(population_data.loc[population_data["Location"] == country, "population"].iloc[0].replace(",",""))
    percentage = count_article[country] / population
    prop_article_per_population.append("{:.10%}".format(percentage))
    df_prop_article_per_population.loc[num_country] = [country,population,count_article[country],"{:.10%}".format(percentage)]
    num_country += 1



In [29]:

    
# Show the table of the proportion of articles-per-population for each country
df_prop_article_per_population









    Out[29]:






  
    
      
      country
      population
      num_articles
      prop_article_per_population
    
  
  
    
      0
      Zambia
      15473900.0
      26.0
      0.0001680249%
    
    
      1
      Chad
      13707000.0
      100.0
      0.0007295542%
    
    
      2
      Zimbabwe
      17354000.0
      167.0
      0.0009623142%
    
    
      3
      Uganda
      40141000.0
      188.0
      0.0004683491%
    
    
      4
      Namibia
      2482100.0
      165.0
      0.0066475968%
    
    
      5
      Nigeria
      181839400.0
      684.0
      0.0003761561%
    
    
      6
      Colombia
      48218000.0
      288.0
      0.0005972873%
    
    
      7
      Chile
      18025000.0
      352.0
      0.0019528433%
    
    
      8
      Fiji
      867000.0
      199.0
      0.0229527105%
    
    
      9
      Solomon Islands
      641900.0
      98.0
      0.0152671756%
    
    
      10
      Palestinian Territory
      4481195.0
      183.0
      0.0040837321%
    
    
      11
      Somalia
      11123000.0
      339.0
      0.0030477389%
    
    
      12
      Cambodia
      15417100.0
      217.0
      0.0014075280%
    
    
      13
      Slovakia
      5424051.0
      119.0
      0.0021939322%
    
    
      14
      Slovenia
      2064000.0
      59.0
      0.0028585271%
    
    
      15
      Afghanistan
      32247000.0
      327.0
      0.0010140478%
    
    
      16
      Iraq
      37056000.0
      302.0
      0.0008149827%
    
    
      17
      Nepal
      28039000.0
      363.0
      0.0012946253%
    
    
      18
      Sri Lanka
      20868800.0
      465.0
      0.0022282067%
    
    
      19
      Laos
      6903049.0
      109.0
      0.0015790124%
    
    
      20
      Albania
      2892000.0
      460.0
      0.0159059474%
    
    
      21
      Costa Rica
      4832000.0
      150.0
      0.0031043046%
    
    
      22
      Czech Republic
      10551227.0
      254.0
      0.0024073030%
    
    
      23
      Canada
      35833000.0
      852.0
      0.0023776965%
    
    
      24
      Tunisia
      11026000.0
      140.0
      0.0012697261%
    
    
      25
      Guatemala
      16183752.0
      84.0
      0.0005190391%
    
    
      26
      Burkina Faso
      18450400.0
      97.0
      0.0005257339%
    
    
      27
      Angola
      25000000.0
      110.0
      0.0004400000%
    
    
      28
      Panama
      3980000.0
      109.0
      0.0027386935%
    
    
      29
      Japan
      126866820.0
      441.0
      0.0003476086%
    
    
      ...
      ...
      ...
      ...
      ...
    
    
      157
      Thailand
      65121250.0
      112.0
      0.0001719869%
    
    
      158
      Latvia
      1978454.0
      56.0
      0.0028304929%
    
    
      159
      Suriname
      576000.0
      40.0
      0.0069444444%
    
    
      160
      Niger
      18884462.0
      80.0
      0.0004236287%
    
    
      161
      Martinique
      379000.0
      34.0
      0.0089709763%
    
    
      162
      Mauritania
      3641288.0
      52.0
      0.0014280661%
    
    
      163
      Cameroon
      23739000.0
      106.0
      0.0004465226%
    
    
      164
      Lesotho
      1924381.0
      30.0
      0.0015589428%
    
    
      165
      Cyprus
      1153000.0
      102.0
      0.0088464874%
    
    
      166
      Gambia
      2021893.0
      82.0
      0.0040556053%
    
    
      167
      Uzbekistan
      31290791.0
      29.0
      0.0000926790%
    
    
      168
      Bahrain
      1412299.0
      42.0
      0.0029738745%
    
    
      169
      Eritrea
      5200000.0
      16.0
      0.0003076923%
    
    
      170
      Kuwait
      3837700.0
      37.0
      0.0009641191%
    
    
      171
      Burundi
      10742000.0
      76.0
      0.0007075033%
    
    
      172
      Central African Republic
      5551900.0
      68.0
      0.0012248059%
    
    
      173
      Equatorial Guinea
      805000.0
      32.0
      0.0039751553%
    
    
      174
      Guadeloupe
      407000.0
      49.0
      0.0120393120%
    
    
      175
      Kosovo
      1802000.0
      48.0
      0.0026637070%
    
    
      176
      Cape Verde
      514000.0
      37.0
      0.0071984436%
    
    
      177
      Andorra
      78000.0
      34.0
      0.0435897436%
    
    
      178
      Comoros
      764000.0
      51.0
      0.0066753927%
    
    
      179
      Trinidad and Tobago
      1351000.0
      28.0
      0.0020725389%
    
    
      180
      Federated States of Micronesia
      103000.0
      38.0
      0.0368932039%
    
    
      181
      Dominica
      68000.0
      12.0
      0.0176470588%
    
    
      182
      Bahamas
      377000.0
      20.0
      0.0053050398%
    
    
      183
      Swaziland
      1286000.0
      32.0
      0.0024883359%
    
    
      184
      Barbados
      278000.0
      14.0
      0.0050359712%
    
    
      185
      Belize
      368000.0
      16.0
      0.0043478261%
    
    
      186
      Seychelles
      92833.0
      22.0
      0.0236984693%
    
  

187 rows × 4 columns

Calculate the proportion (as a percentage) of high-quality articles for each country.



In [30]:

    
prop_high_quality_articles_each_country = []
df_prop_high_quality_articles_each_country = pd.DataFrame(columns=["country","num_high_quality_articles","num_articles","prop_high_quality_articles"])
num_country = 0

for country in count_article:
    num_FA = Counter(merge_data.loc[merge_data['country'] == country].iloc[:,3].tolist())['FA']
    num_GA = Counter(merge_data.loc[merge_data['country'] == country].iloc[:,3].tolist())['GA']
    num_high_quality = num_FA + num_GA
    percentage = num_high_quality / count_article[country]
    prop_high_quality_articles_each_country.append("{:.10%}".format(percentage))
    df_prop_high_quality_articles_each_country.loc[num_country] = [country,num_high_quality,count_article[country],"{:.10%}".format(percentage)]
    num_country += 1



In [31]:

    
# Show the table of the proportion of high-quality articles for each country
df_prop_high_quality_articles_each_country









    Out[31]:






  
    
      
      country
      num_high_quality_articles
      num_articles
      prop_high_quality_articles
    
  
  
    
      0
      Zambia
      0.0
      26.0
      0.0000000000%
    
    
      1
      Chad
      2.0
      100.0
      2.0000000000%
    
    
      2
      Zimbabwe
      2.0
      167.0
      1.1976047904%
    
    
      3
      Uganda
      1.0
      188.0
      0.5319148936%
    
    
      4
      Namibia
      1.0
      165.0
      0.6060606061%
    
    
      5
      Nigeria
      5.0
      684.0
      0.7309941520%
    
    
      6
      Colombia
      3.0
      288.0
      1.0416666667%
    
    
      7
      Chile
      3.0
      352.0
      0.8522727273%
    
    
      8
      Fiji
      1.0
      199.0
      0.5025125628%
    
    
      9
      Solomon Islands
      0.0
      98.0
      0.0000000000%
    
    
      10
      Palestinian Territory
      11.0
      183.0
      6.0109289617%
    
    
      11
      Somalia
      9.0
      339.0
      2.6548672566%
    
    
      12
      Cambodia
      5.0
      217.0
      2.3041474654%
    
    
      13
      Slovakia
      2.0
      119.0
      1.6806722689%
    
    
      14
      Slovenia
      1.0
      59.0
      1.6949152542%
    
    
      15
      Afghanistan
      15.0
      327.0
      4.5871559633%
    
    
      16
      Iraq
      8.0
      302.0
      2.6490066225%
    
    
      17
      Nepal
      0.0
      363.0
      0.0000000000%
    
    
      18
      Sri Lanka
      8.0
      465.0
      1.7204301075%
    
    
      19
      Laos
      3.0
      109.0
      2.7522935780%
    
    
      20
      Albania
      5.0
      460.0
      1.0869565217%
    
    
      21
      Costa Rica
      0.0
      150.0
      0.0000000000%
    
    
      22
      Czech Republic
      1.0
      254.0
      0.3937007874%
    
    
      23
      Canada
      29.0
      852.0
      3.4037558685%
    
    
      24
      Tunisia
      1.0
      140.0
      0.7142857143%
    
    
      25
      Guatemala
      6.0
      84.0
      7.1428571429%
    
    
      26
      Burkina Faso
      3.0
      97.0
      3.0927835052%
    
    
      27
      Angola
      1.0
      110.0
      0.9090909091%
    
    
      28
      Panama
      5.0
      109.0
      4.5871559633%
    
    
      29
      Japan
      9.0
      441.0
      2.0408163265%
    
    
      ...
      ...
      ...
      ...
      ...
    
    
      157
      Thailand
      3.0
      112.0
      2.6785714286%
    
    
      158
      Latvia
      1.0
      56.0
      1.7857142857%
    
    
      159
      Suriname
      0.0
      40.0
      0.0000000000%
    
    
      160
      Niger
      3.0
      80.0
      3.7500000000%
    
    
      161
      Martinique
      1.0
      34.0
      2.9411764706%
    
    
      162
      Mauritania
      4.0
      52.0
      7.6923076923%
    
    
      163
      Cameroon
      1.0
      106.0
      0.9433962264%
    
    
      164
      Lesotho
      0.0
      30.0
      0.0000000000%
    
    
      165
      Cyprus
      1.0
      102.0
      0.9803921569%
    
    
      166
      Gambia
      6.0
      82.0
      7.3170731707%
    
    
      167
      Uzbekistan
      3.0
      29.0
      10.3448275862%
    
    
      168
      Bahrain
      0.0
      42.0
      0.0000000000%
    
    
      169
      Eritrea
      0.0
      16.0
      0.0000000000%
    
    
      170
      Kuwait
      1.0
      37.0
      2.7027027027%
    
    
      171
      Burundi
      1.0
      76.0
      1.3157894737%
    
    
      172
      Central African Republic
      7.0
      68.0
      10.2941176471%
    
    
      173
      Equatorial Guinea
      1.0
      32.0
      3.1250000000%
    
    
      174
      Guadeloupe
      0.0
      49.0
      0.0000000000%
    
    
      175
      Kosovo
      1.0
      48.0
      2.0833333333%
    
    
      176
      Cape Verde
      0.0
      37.0
      0.0000000000%
    
    
      177
      Andorra
      0.0
      34.0
      0.0000000000%
    
    
      178
      Comoros
      0.0
      51.0
      0.0000000000%
    
    
      179
      Trinidad and Tobago
      1.0
      28.0
      3.5714285714%
    
    
      180
      Federated States of Micronesia
      0.0
      38.0
      0.0000000000%
    
    
      181
      Dominica
      1.0
      12.0
      8.3333333333%
    
    
      182
      Bahamas
      0.0
      20.0
      0.0000000000%
    
    
      183
      Swaziland
      0.0
      32.0
      0.0000000000%
    
    
      184
      Barbados
      0.0
      14.0
      0.0000000000%
    
    
      185
      Belize
      0.0
      16.0
      0.0000000000%
    
    
      186
      Seychelles
      0.0
      22.0
      0.0000000000%
    
  

187 rows × 4 columns

Tables

Produce four tables that show:

10 highest-ranked countries in terms of number of politician articles as a proportion of country population
10 lowest-ranked countries in terms of number of politician articles as a proportion of country population
10 highest-ranked countries in terms of number of GA and FA-quality articles as a proportion of all articles about politicians from that country
10 lowest-ranked countries in terms of number of GA and FA-quality articles as a proportion of all articles about politicians from that country

10 highest-ranked countries in terms of number of politician articles as a proportion of country population



In [32]:

    
# Get index of 10 highest-ranked countries
idx = df_prop_article_per_population["prop_article_per_population"].apply(lambda x:float(x.strip('%'))/100).sort_values(ascending=False).index[0:10]
# Retrieve these rows by index values
highest_rank_10_prop_article_per_population = df_prop_article_per_population.loc[idx]
highest_rank_10_prop_article_per_population.to_csv("highest_rank_10_prop_article_per_population.csv")
highest_rank_10_prop_article_per_population









    Out[32]:






  
    
      
      country
      population
      num_articles
      prop_article_per_population
    
  
  
    
      124
      Nauru
      10860.0
      53.0
      0.4880294659%
    
    
      114
      Tuvalu
      11800.0
      55.0
      0.4661016949%
    
    
      98
      San Marino
      33000.0
      82.0
      0.2484848485%
    
    
      134
      Monaco
      38088.0
      40.0
      0.1050199538%
    
    
      142
      Liechtenstein
      37570.0
      29.0
      0.0771892467%
    
    
      148
      Marshall Islands
      55000.0
      37.0
      0.0672727273%
    
    
      53
      Iceland
      330828.0
      206.0
      0.0622680063%
    
    
      138
      Tonga
      103300.0
      63.0
      0.0609874153%
    
    
      177
      Andorra
      78000.0
      34.0
      0.0435897436%
    
    
      180
      Federated States of Micronesia
      103000.0
      38.0
      0.0368932039%

10 lowest-ranked countries in terms of number of politician articles as a proportion of country population



In [33]:

    
# Get index of 10 lowest-ranked countries
idx = df_prop_article_per_population["prop_article_per_population"].apply(lambda x:float(x.strip('%'))/100).sort_values(ascending=True).index[0:10]
# Retrieve these rows by index values
lowest_rank_10_prop_article_per_population = df_prop_article_per_population.loc[idx]
lowest_rank_10_prop_article_per_population.to_csv("lowest_rank_10_prop_article_per_population.csv")
lowest_rank_10_prop_article_per_population









    Out[33]:






  
    
      
      country
      population
      num_articles
      prop_article_per_population
    
  
  
    
      44
      India
      1.314098e+09
      990.0
      0.0000753369%
    
    
      80
      China
      1.371920e+09
      1138.0
      0.0000829494%
    
    
      30
      Indonesia
      2.557420e+08
      215.0
      0.0000840691%
    
    
      167
      Uzbekistan
      3.129079e+07
      29.0
      0.0000926790%
    
    
      113
      Ethiopia
      9.814800e+07
      105.0
      0.0001069813%
    
    
      119
      Korea, North
      2.498300e+07
      39.0
      0.0001561062%
    
    
      0
      Zambia
      1.547390e+07
      26.0
      0.0001680249%
    
    
      157
      Thailand
      6.512125e+07
      112.0
      0.0001719869%
    
    
      110
      Congo, Dem. Rep. of
      7.334020e+07
      142.0
      0.0001936182%
    
    
      43
      Bangladesh
      1.604110e+08
      324.0
      0.0002019812%

10 highest-ranked countries in terms of number of GA and FA-quality articles as a proportion of all articles about politicians from that country



In [34]:

    
# Get index of 10 highest-ranked countries
idx = df_prop_high_quality_articles_each_country["prop_high_quality_articles"].apply(lambda x:float(x.strip('%'))/100).sort_values(ascending=False).index[0:10]
# Retrieve these rows by index values
highest_rank_10_prop_high_quality_articles = df_prop_high_quality_articles_each_country.loc[idx]
highest_rank_10_prop_high_quality_articles.to_csv("highest_rank_10_prop_high_quality_articles.csv")
highest_rank_10_prop_high_quality_articles









    Out[34]:






  
    
      
      country
      num_high_quality_articles
      num_articles
      prop_high_quality_articles
    
  
  
    
      119
      Korea, North
      9.0
      39.0
      23.0769230769%
    
    
      128
      Saudi Arabia
      14.0
      119.0
      11.7647058824%
    
    
      167
      Uzbekistan
      3.0
      29.0
      10.3448275862%
    
    
      172
      Central African Republic
      7.0
      68.0
      10.2941176471%
    
    
      55
      Romania
      34.0
      348.0
      9.7701149425%
    
    
      144
      Guinea-Bissau
      2.0
      21.0
      9.5238095238%
    
    
      156
      Bhutan
      3.0
      33.0
      9.0909090909%
    
    
      91
      Vietnam
      16.0
      191.0
      8.3769633508%
    
    
      181
      Dominica
      1.0
      12.0
      8.3333333333%
    
    
      162
      Mauritania
      4.0
      52.0
      7.6923076923%

10 lowest-ranked countries in terms of number of GA and FA-quality articles as a proportion of all articles about politicians from that country



In [35]:

    
# Get index of 10 lowest-ranked countries
idx = df_prop_high_quality_articles_each_country["prop_high_quality_articles"].apply(lambda x:float(x.strip('%'))/100).sort_values(ascending=True).index[0:10]
# Retrieve these rows by index values
lowest_rank_10_prop_high_quality_articles = df_prop_high_quality_articles_each_country.loc[idx]
lowest_rank_10_prop_high_quality_articles.to_csv("lowest_rank_10_prop_high_quality_articles_allzeros.csv")
lowest_rank_10_prop_high_quality_articles









    Out[35]:






  
    
      
      country
      num_high_quality_articles
      num_articles
      prop_high_quality_articles
    
  
  
    
      0
      Zambia
      0.0
      26.0
      0.0000000000%
    
    
      138
      Tonga
      0.0
      63.0
      0.0000000000%
    
    
      134
      Monaco
      0.0
      40.0
      0.0000000000%
    
    
      131
      Tajikistan
      0.0
      40.0
      0.0000000000%
    
    
      127
      Mozambique
      0.0
      60.0
      0.0000000000%
    
    
      124
      Nauru
      0.0
      53.0
      0.0000000000%
    
    
      115
      Antigua and Barbuda
      0.0
      25.0
      0.0000000000%
    
    
      142
      Liechtenstein
      0.0
      29.0
      0.0000000000%
    
    
      107
      Malta
      0.0
      103.0
      0.0000000000%
    
    
      102
      French Guiana
      0.0
      28.0
      0.0000000000%



In [70]:

    
# Get index of 10 lowest-ranked countries that proportions of high-quality articles are NOT equal to 0
idx = df_prop_high_quality_articles_each_country["prop_high_quality_articles"].apply(lambda x:float(x.strip('%'))/100).sort_values(ascending=True)!=0
idx_not_zero = idx[idx == True].index[0:10]
lowest_rank_10_prop_high_quality_articles_not_zero = df_prop_high_quality_articles_each_country.loc[idx_not_zero]
lowest_rank_10_prop_high_quality_articles_not_zero.to_csv("lowest_rank_10_prop_high_quality_articles_notzeros.csv")
lowest_rank_10_prop_high_quality_articles_not_zero









    Out[70]:






  
    
      
      country
      num_high_quality_articles
      num_articles
      prop_high_quality_articles
    
  
  
    
      72
      Tanzania
      1.0
      408.0
      0.2450980392%
    
    
      22
      Czech Republic
      1.0
      254.0
      0.3937007874%
    
    
      89
      Lithuania
      1.0
      248.0
      0.4032258065%
    
    
      135
      Morocco
      1.0
      208.0
      0.4807692308%
    
    
      8
      Fiji
      1.0
      199.0
      0.5025125628%
    
    
      3
      Uganda
      1.0
      188.0
      0.5319148936%
    
    
      68
      Bolivia
      1.0
      187.0
      0.5347593583%
    
    
      57
      Luxembourg
      1.0
      180.0
      0.5555555556%
    
    
      37
      Peru
      2.0
      354.0
      0.5649717514%
    
    
      73
      Sierra Leone
      1.0
      166.0
      0.6024096386%

	country	population	num_articles	prop_article_per_population
0	Zambia	15473900.0	26.0	0.0001680249%
1	Chad	13707000.0	100.0	0.0007295542%
2	Zimbabwe	17354000.0	167.0	0.0009623142%
3	Uganda	40141000.0	188.0	0.0004683491%
4	Namibia	2482100.0	165.0	0.0066475968%
5	Nigeria	181839400.0	684.0	0.0003761561%
6	Colombia	48218000.0	288.0	0.0005972873%
7	Chile	18025000.0	352.0	0.0019528433%
8	Fiji	867000.0	199.0	0.0229527105%
9	Solomon Islands	641900.0	98.0	0.0152671756%
10	Palestinian Territory	4481195.0	183.0	0.0040837321%
11	Somalia	11123000.0	339.0	0.0030477389%
12	Cambodia	15417100.0	217.0	0.0014075280%
13	Slovakia	5424051.0	119.0	0.0021939322%
14	Slovenia	2064000.0	59.0	0.0028585271%
15	Afghanistan	32247000.0	327.0	0.0010140478%
16	Iraq	37056000.0	302.0	0.0008149827%
17	Nepal	28039000.0	363.0	0.0012946253%
18	Sri Lanka	20868800.0	465.0	0.0022282067%
19	Laos	6903049.0	109.0	0.0015790124%
20	Albania	2892000.0	460.0	0.0159059474%
21	Costa Rica	4832000.0	150.0	0.0031043046%
22	Czech Republic	10551227.0	254.0	0.0024073030%
23	Canada	35833000.0	852.0	0.0023776965%
24	Tunisia	11026000.0	140.0	0.0012697261%
25	Guatemala	16183752.0	84.0	0.0005190391%
26	Burkina Faso	18450400.0	97.0	0.0005257339%
27	Angola	25000000.0	110.0	0.0004400000%
28	Panama	3980000.0	109.0	0.0027386935%
29	Japan	126866820.0	441.0	0.0003476086%
...	...	...	...	...
157	Thailand	65121250.0	112.0	0.0001719869%
158	Latvia	1978454.0	56.0	0.0028304929%
159	Suriname	576000.0	40.0	0.0069444444%
160	Niger	18884462.0	80.0	0.0004236287%
161	Martinique	379000.0	34.0	0.0089709763%
162	Mauritania	3641288.0	52.0	0.0014280661%
163	Cameroon	23739000.0	106.0	0.0004465226%
164	Lesotho	1924381.0	30.0	0.0015589428%
165	Cyprus	1153000.0	102.0	0.0088464874%
166	Gambia	2021893.0	82.0	0.0040556053%
167	Uzbekistan	31290791.0	29.0	0.0000926790%
168	Bahrain	1412299.0	42.0	0.0029738745%
169	Eritrea	5200000.0	16.0	0.0003076923%
170	Kuwait	3837700.0	37.0	0.0009641191%
171	Burundi	10742000.0	76.0	0.0007075033%
172	Central African Republic	5551900.0	68.0	0.0012248059%
173	Equatorial Guinea	805000.0	32.0	0.0039751553%
174	Guadeloupe	407000.0	49.0	0.0120393120%
175	Kosovo	1802000.0	48.0	0.0026637070%
176	Cape Verde	514000.0	37.0	0.0071984436%
177	Andorra	78000.0	34.0	0.0435897436%
178	Comoros	764000.0	51.0	0.0066753927%
179	Trinidad and Tobago	1351000.0	28.0	0.0020725389%
180	Federated States of Micronesia	103000.0	38.0	0.0368932039%
181	Dominica	68000.0	12.0	0.0176470588%
182	Bahamas	377000.0	20.0	0.0053050398%
183	Swaziland	1286000.0	32.0	0.0024883359%
184	Barbados	278000.0	14.0	0.0050359712%
185	Belize	368000.0	16.0	0.0043478261%
186	Seychelles	92833.0	22.0	0.0236984693%

	country	num_high_quality_articles	num_articles	prop_high_quality_articles
0	Zambia	0.0	26.0	0.0000000000%
1	Chad	2.0	100.0	2.0000000000%
2	Zimbabwe	2.0	167.0	1.1976047904%
3	Uganda	1.0	188.0	0.5319148936%
4	Namibia	1.0	165.0	0.6060606061%
5	Nigeria	5.0	684.0	0.7309941520%
6	Colombia	3.0	288.0	1.0416666667%
7	Chile	3.0	352.0	0.8522727273%
8	Fiji	1.0	199.0	0.5025125628%
9	Solomon Islands	0.0	98.0	0.0000000000%
10	Palestinian Territory	11.0	183.0	6.0109289617%
11	Somalia	9.0	339.0	2.6548672566%
12	Cambodia	5.0	217.0	2.3041474654%
13	Slovakia	2.0	119.0	1.6806722689%
14	Slovenia	1.0	59.0	1.6949152542%
15	Afghanistan	15.0	327.0	4.5871559633%
16	Iraq	8.0	302.0	2.6490066225%
17	Nepal	0.0	363.0	0.0000000000%
18	Sri Lanka	8.0	465.0	1.7204301075%
19	Laos	3.0	109.0	2.7522935780%
20	Albania	5.0	460.0	1.0869565217%
21	Costa Rica	0.0	150.0	0.0000000000%
22	Czech Republic	1.0	254.0	0.3937007874%
23	Canada	29.0	852.0	3.4037558685%
24	Tunisia	1.0	140.0	0.7142857143%
25	Guatemala	6.0	84.0	7.1428571429%
26	Burkina Faso	3.0	97.0	3.0927835052%
27	Angola	1.0	110.0	0.9090909091%
28	Panama	5.0	109.0	4.5871559633%
29	Japan	9.0	441.0	2.0408163265%
...	...	...	...	...
157	Thailand	3.0	112.0	2.6785714286%
158	Latvia	1.0	56.0	1.7857142857%
159	Suriname	0.0	40.0	0.0000000000%
160	Niger	3.0	80.0	3.7500000000%
161	Martinique	1.0	34.0	2.9411764706%
162	Mauritania	4.0	52.0	7.6923076923%
163	Cameroon	1.0	106.0	0.9433962264%
164	Lesotho	0.0	30.0	0.0000000000%
165	Cyprus	1.0	102.0	0.9803921569%
166	Gambia	6.0	82.0	7.3170731707%
167	Uzbekistan	3.0	29.0	10.3448275862%
168	Bahrain	0.0	42.0	0.0000000000%
169	Eritrea	0.0	16.0	0.0000000000%
170	Kuwait	1.0	37.0	2.7027027027%
171	Burundi	1.0	76.0	1.3157894737%
172	Central African Republic	7.0	68.0	10.2941176471%
173	Equatorial Guinea	1.0	32.0	3.1250000000%
174	Guadeloupe	0.0	49.0	0.0000000000%
175	Kosovo	1.0	48.0	2.0833333333%
176	Cape Verde	0.0	37.0	0.0000000000%
177	Andorra	0.0	34.0	0.0000000000%
178	Comoros	0.0	51.0	0.0000000000%
179	Trinidad and Tobago	1.0	28.0	3.5714285714%
180	Federated States of Micronesia	0.0	38.0	0.0000000000%
181	Dominica	1.0	12.0	8.3333333333%
182	Bahamas	0.0	20.0	0.0000000000%
183	Swaziland	0.0	32.0	0.0000000000%
184	Barbados	0.0	14.0	0.0000000000%
185	Belize	0.0	16.0	0.0000000000%
186	Seychelles	0.0	22.0	0.0000000000%

	country	population	num_articles	prop_article_per_population
124	Nauru	10860.0	53.0	0.4880294659%
114	Tuvalu	11800.0	55.0	0.4661016949%
98	San Marino	33000.0	82.0	0.2484848485%
134	Monaco	38088.0	40.0	0.1050199538%
142	Liechtenstein	37570.0	29.0	0.0771892467%
148	Marshall Islands	55000.0	37.0	0.0672727273%
53	Iceland	330828.0	206.0	0.0622680063%
138	Tonga	103300.0	63.0	0.0609874153%
177	Andorra	78000.0	34.0	0.0435897436%
180	Federated States of Micronesia	103000.0	38.0	0.0368932039%

	country	population	num_articles	prop_article_per_population
44	India	1.314098e+09	990.0	0.0000753369%
80	China	1.371920e+09	1138.0	0.0000829494%
30	Indonesia	2.557420e+08	215.0	0.0000840691%
167	Uzbekistan	3.129079e+07	29.0	0.0000926790%
113	Ethiopia	9.814800e+07	105.0	0.0001069813%
119	Korea, North	2.498300e+07	39.0	0.0001561062%
0	Zambia	1.547390e+07	26.0	0.0001680249%
157	Thailand	6.512125e+07	112.0	0.0001719869%
110	Congo, Dem. Rep. of	7.334020e+07	142.0	0.0001936182%
43	Bangladesh	1.604110e+08	324.0	0.0002019812%

	country	num_high_quality_articles	num_articles	prop_high_quality_articles
119	Korea, North	9.0	39.0	23.0769230769%
128	Saudi Arabia	14.0	119.0	11.7647058824%
167	Uzbekistan	3.0	29.0	10.3448275862%
172	Central African Republic	7.0	68.0	10.2941176471%
55	Romania	34.0	348.0	9.7701149425%
144	Guinea-Bissau	2.0	21.0	9.5238095238%
156	Bhutan	3.0	33.0	9.0909090909%
91	Vietnam	16.0	191.0	8.3769633508%
181	Dominica	1.0	12.0	8.3333333333%
162	Mauritania	4.0	52.0	7.6923076923%

	country	num_high_quality_articles	num_articles	prop_high_quality_articles
72	Tanzania	1.0	408.0	0.2450980392%
22	Czech Republic	1.0	254.0	0.3937007874%
89	Lithuania	1.0	248.0	0.4032258065%
135	Morocco	1.0	208.0	0.4807692308%
8	Fiji	1.0	199.0	0.5025125628%
3	Uganda	1.0	188.0	0.5319148936%
68	Bolivia	1.0	187.0	0.5347593583%
57	Luxembourg	1.0	180.0	0.5555555556%
37	Peru	2.0	354.0	0.5649717514%
73	Sierra Leone	1.0	166.0	0.6024096386%