Bias on Wikipedia

For this assignment (https://wiki.communitydata.cc/HCDS_(Fall_2017)/Assignments#A2:_Bias_in_data), your job is to analyze what the nature of political articles on Wikipedia - both their existence, and their quality - can tell us about bias in Wikipedia's content.

Making ORES requests

Below is an example of how to make requests through the ORES system in Python to find out the current quality of an article. Specifically, this is a function designed to make a request with multiple revision IDs. You can take this function, split your revision IDs up into chunks of 50 or 100 to avoid hitting limits in ORES, pass each chunk through this function, and then stitch the whole set together.


In [4]:
import requests
import json

headers = {'User-Agent' : 'https://github.com/your_github_username', 'From' : 'your_uw_email@uw.edu'}

def get_ores_data(revision_ids, headers):
    
    # Define the endpoint
    endpoint = 'https://ores.wikimedia.org/v3/scores/{project}/?models={model}&revids={revids}'
    
    # Specify the parameters - smushing all the revision IDs together separated by | marks.
    # Yes, 'smush' is a technical term, trust me I'm a scientist.
    # What do you mean "but people trusting scientists regularly goes horribly wrong" who taught you tha- oh.  
    params = {'project' : 'enwiki',
              'model'   : 'wp10',
              'revids'  : '|'.join(str(x) for x in revision_ids)
              }
    api_call = requests.get(endpoint.format(**params))
    response = api_call.json()
    print(json.dumps(response, indent=4, sort_keys=True))


# So if we grab some example revision IDs and turn them into a list and then call get_ores_data...
example_ids = [783381498, 807355596, 757539710]
get_ores_data(example_ids, headers)


{
    "enwiki": {
        "models": {
            "wp10": {
                "version": "0.5.0"
            }
        },
        "scores": {
            "757539710": {
                "wp10": {
                    "score": {
                        "prediction": "Start",
                        "probability": {
                            "B": 0.0950995993086368,
                            "C": 0.1709859524092081,
                            "FA": 0.002534267983331672,
                            "GA": 0.005731369423122624,
                            "Start": 0.7091352495053856,
                            "Stub": 0.01651356137031511
                        }
                    }
                }
            },
            "783381498": {
                "wp10": {
                    "score": {
                        "prediction": "Start",
                        "probability": {
                            "B": 0.020202281665235494,
                            "C": 0.040498863202895134,
                            "FA": 0.002648428776337411,
                            "GA": 0.005101906528059532,
                            "Start": 0.4793812253273645,
                            "Stub": 0.452167294500108
                        }
                    }
                }
            },
            "807355596": {
                "wp10": {
                    "score": {
                        "prediction": "Start",
                        "probability": {
                            "B": 0.03174101428217136,
                            "C": 0.05151754557912956,
                            "FA": 0.0030585651086261195,
                            "GA": 0.0059733440502096275,
                            "Start": 0.7992220699630344,
                            "Stub": 0.10848746101682906
                        }
                    }
                }
            }
        }
    }
}

Importing the other data is just a matter of reading CSV files in! And if you're an R programmer wondering where the R example is - check the other file in this example.


In [3]:
## getting the data from the CSV files
import csv

data = []
with open('page_data.csv') as csvfile:
    reader = csv.reader(csvfile)
    for row in reader:
        data.append([row[0],row[1],row[2]])

In [5]:
print(data[782])


['Albania', 'Aćif Hadžiahmetović', '742544909']

In [ ]: