Bias on Wikipedia

For this assignment (https://wiki.communitydata.cc/HCDS_(Fall_2017)/Assignments#A2:_Bias_in_data), your job is to analyze what the nature of political articles on Wikipedia - both their existence, and their quality - can tell us about bias in Wikipedia's content.

Making ORES requests

Below is an example of how to make requests through the ORES system in Python to find out the current quality of an article. Specifically, this is a function designed to make a request with multiple revision IDs. You can take this function, split your revision IDs up into chunks of 50 or 100 to avoid hitting limits in ORES, pass each chunk through this function, and then stitch the whole set together.


In [2]:
# Dependencies
library(httr)
library(magrittr)

user_agent <- "https://github.com/your_github_username your@email.address"

# Define a single function that will happily take multiple revision IDs
get_ores_data <- function(revision_ids, user_agent){
    
    # Define the parameters, collapsing the revision IDs into a single string separated by | marks
    parameters <- list(models = "wp10",
                       revids = paste0(revision_ids, collapse = "|")
                       )
    url <- "https://ores.wikimedia.org/v3/scores/enwiki/"
    
    # Make the query, check for an error, retrieve the content and convert it from JSON
    results <- httr::GET(url, query = parameters,  user_agent(user_agent)) %>%
        stop_for_status %>%
        content
    
    # Return the results
    return(results)
}

# So if we grab some example revision IDs and turn them into a list and then call get_ores_data...
example_ids  <- c(783381498, 807355596, 757539710)
str(get_ores_data(example_ids, user_agent))


List of 1
 $ enwiki:List of 2
  ..$ models:List of 1
  .. ..$ wp10:List of 1
  .. .. ..$ version: chr "0.5.0"
  ..$ scores:List of 3
  .. ..$ 757539710:List of 1
  .. .. ..$ wp10:List of 1
  .. .. .. ..$ score:List of 2
  .. .. .. .. ..$ prediction : chr "Start"
  .. .. .. .. ..$ probability:List of 6
  .. .. .. .. .. ..$ B    : num 0.0951
  .. .. .. .. .. ..$ C    : num 0.171
  .. .. .. .. .. ..$ FA   : num 0.00253
  .. .. .. .. .. ..$ GA   : num 0.00573
  .. .. .. .. .. ..$ Start: num 0.709
  .. .. .. .. .. ..$ Stub : num 0.0165
  .. ..$ 783381498:List of 1
  .. .. ..$ wp10:List of 1
  .. .. .. ..$ score:List of 2
  .. .. .. .. ..$ prediction : chr "Start"
  .. .. .. .. ..$ probability:List of 6
  .. .. .. .. .. ..$ B    : num 0.0202
  .. .. .. .. .. ..$ C    : num 0.0405
  .. .. .. .. .. ..$ FA   : num 0.00265
  .. .. .. .. .. ..$ GA   : num 0.0051
  .. .. .. .. .. ..$ Start: num 0.479
  .. .. .. .. .. ..$ Stub : num 0.452
  .. ..$ 807355596:List of 1
  .. .. ..$ wp10:List of 1
  .. .. .. ..$ score:List of 2
  .. .. .. .. ..$ prediction : chr "Start"
  .. .. .. .. ..$ probability:List of 6
  .. .. .. .. .. ..$ B    : num 0.0317
  .. .. .. .. .. ..$ C    : num 0.0515
  .. .. .. .. .. ..$ FA   : num 0.00306
  .. .. .. .. .. ..$ GA   : num 0.00597
  .. .. .. .. .. ..$ Start: num 0.799
  .. .. .. .. .. ..$ Stub : num 0.108