In [1]:
!pip install git+https://github.com/ipython/ipynb.git
!pip install pymysql


Collecting git+https://github.com/ipython/ipynb.git
  Cloning https://github.com/ipython/ipynb.git to /tmp/pip-cai0mvkd-build
  Requirement already satisfied (use --upgrade to upgrade): ipynb==0.5 from git+https://github.com/ipython/ipynb.git in /srv/paws/lib/python3.4/site-packages
Requirement already satisfied: pymysql in /srv/paws/lib/python3.4/site-packages

Database-based monthly stats

In this notebook, we'll use a database table to aggregate monthly article quality scores. We'll be using an SQL query to do the aggregation, writing the aggregated data out to a file that can then be imported in another script for analysis.

One important note is regarding how we'll select the articles within Wikipedia that correspond to a specific WikiProject. To do this, we'll be using a WikiProject template -- a bit of structured wikitext that WikiProjects use to tag and add metadata to articles. This worklog shows some minor complications with using the templatelinks table to gather this list of articles. https://meta.wikimedia.org/wiki/Research_talk:Quality_dynamics_of_English_Wikipedia/Work_log/2017-02-17

In this notebook, we'll be using the methodology described there to find the "main" template and the wikiproject_aggregation query (defined in db_monthly_stats.ipynb) to also include all redirecting templates.


In [2]:
from ipynb.fs.full.article_quality.db_monthly_stats import DBMonthlyStats, dump_aggregation

Read the configuration


In [3]:
import configparser
config = configparser.ConfigParser()
config.read('../settings.cfg')


Out[3]:
['../settings.cfg']

Utility to make sure we only generate files once


In [4]:
import os
def write_once(path, write_to):
    if not os.path.exists(path):
        print("Writing out " + path)
        with open(path, "w") as f:
            write_to(f)

Dump the monthly aggregations


In [7]:
dbms = DBMonthlyStats(config)

write_once(
    "../data/processed/enwiki.full_wiki_aggregation.tsv", 
    lambda f: dump_aggregation(dbms.all_wiki_aggregation(), f))

write_once(
    "../data/processed/enwiki.wikiproject_women_scientists_aggregation.tsv", 
    lambda f: dump_aggregation(dbms.wikiproject_aggregation("WikiProject_Women_scientists"), f))

write_once(
    "../data/processed/enwiki.wikiproject_oregon_aggregation.tsv", 
    lambda f: dump_aggregation(dbms.wikiproject_aggregation("WikiProject_Oregon"), f))


Writing out ../data/processed/enwiki.full_wiki_aggregation.tsv
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-7-b72d92e1c3ff> in <module>()
      3 write_once(
      4     "../data/processed/enwiki.full_wiki_aggregation.tsv",
----> 5     lambda f: dump_aggregation(dbms.all_wiki_aggregation(), f))
      6 
      7 write_once(

<ipython-input-4-18dcc2a43a2e> in write_once(path, write_to)
      4         print("Writing out " + path)
      5         with open(path, "w") as f:
----> 6             write_to(f)
      7 

<ipython-input-7-b72d92e1c3ff> in <lambda>(f)
      3 write_once(
      4     "../data/processed/enwiki.full_wiki_aggregation.tsv",
----> 5     lambda f: dump_aggregation(dbms.all_wiki_aggregation(), f))
      6 
      7 write_once(

/home/paws/ocdx/aq-new/src/article_quality/db_monthly_stats.ipynb in dump_aggregation(cursor, file)
    110    "outputs": [],
    111    "source": [
--> 112     "class DBMonthlyStats:\n",
    113     "    \n",
    114     "    def __init__(self, config):\n",

TypeError: 'fieldnames' is an invalid keyword argument for this function

In [ ]: