In [1]:
!pip install git+https://github.com/ipython/ipynb.git
!pip install pymysql
In this notebook, we'll use a database table to aggregate monthly article quality scores. We'll be using an SQL query to do the aggregation, writing the aggregated data out to a file that can then be imported in another script for analysis.
One important note is regarding how we'll select the articles within Wikipedia that correspond to a specific WikiProject. To do this, we'll be using a WikiProject template -- a bit of structured wikitext that WikiProjects use to tag and add metadata to articles. This worklog shows some minor complications with using the templatelinks table to gather this list of articles. https://meta.wikimedia.org/wiki/Research_talk:Quality_dynamics_of_English_Wikipedia/Work_log/2017-02-17
In this notebook, we'll be using the methodology described there to find the "main" template and the wikiproject_aggregation query (defined in db_monthly_stats.ipynb) to also include all redirecting templates.
In [2]:
from ipynb.fs.full.article_quality.db_monthly_stats import DBMonthlyStats, dump_aggregation
In [3]:
import configparser
config = configparser.ConfigParser()
config.read('../settings.cfg')
Out[3]:
In [4]:
import os
def write_once(path, write_to):
if not os.path.exists(path):
print("Writing out " + path)
with open(path, "w") as f:
write_to(f)
In [7]:
dbms = DBMonthlyStats(config)
write_once(
"../data/processed/enwiki.full_wiki_aggregation.tsv",
lambda f: dump_aggregation(dbms.all_wiki_aggregation(), f))
write_once(
"../data/processed/enwiki.wikiproject_women_scientists_aggregation.tsv",
lambda f: dump_aggregation(dbms.wikiproject_aggregation("WikiProject_Women_scientists"), f))
write_once(
"../data/processed/enwiki.wikiproject_oregon_aggregation.tsv",
lambda f: dump_aggregation(dbms.wikiproject_aggregation("WikiProject_Oregon"), f))
In [ ]: