Bias on Wikipedia

For this assignment (https://wiki.communitydata.cc/HCDS_(Fall_2017)/Assignments#A2:_Bias_in_data), your job is to analyze what the nature of political articles on Wikipedia - both their existence, and their quality - can tell us about bias in Wikipedia's content.

Example ORES request

Below is an example of how to make a request through the ORES system in Python to find out the current quality of the article on Aaron Halfaker (the person who created ORES):


In [1]:
import requests
import json

endpoint = 'https://ores.wikimedia.org/v3/scores/{project}/{revid}/{model}'
headers = {'User-Agent' : 'https://github.com/your_github_username', 'From' : 'your_uw_email@uw.edu'}

params = {'project' : 'enwiki',
          'model' : 'wp10',
          'revid' : '797882120'
          }

api_call = requests.get(endpoint.format(**params))
response = api_call.json()
print(json.dumps(response, indent=4, sort_keys=True))


{
    "enwiki": {
        "models": {
            "wp10": {
                "version": "0.5.0"
            }
        },
        "scores": {
            "797882120": {
                "wp10": {
                    "score": {
                        "prediction": "Start",
                        "probability": {
                            "B": 0.0325056273665757,
                            "C": 0.10161634736900718,
                            "FA": 0.003680032854794337,
                            "GA": 0.021044772033944954,
                            "Start": 0.8081343649161963,
                            "Stub": 0.033018855459481376
                        }
                    }
                }
            }
        }
    }
}

Importing the other data is just a matter of reading CSV files in! (and for the R programmers - we'll have an R example up as soon as the Hub supports the language).


In [3]:
## getting the data from the CSV files
import csv

data = []
with open('page_data.csv') as csvfile:
    reader = csv.reader(csvfile)
    for row in reader:
        data.append([row[0],row[1],row[2]])

In [5]:
print(data[782])


['Albania', 'Aćif Hadžiahmetović', '742544909']

In [ ]: