Get a list of 50,000 random users who have registered and created a userpage, from https://quarry.wmflabs.org/query/11715
In [3]:
wget https://quarry.wmflabs.org/run/107498/output/0/tsv?download=true -O user_reg_with_userpages.tsv
In [6]:
python retention.py > retention.sh
This creates a shell script to query the database, which only works if you are on the Wikimedia Foundation's analytics server, ToolLabs. See https://tools.wmflabs.org/
The script runs one command to the mysql server for each user, querying the number of edits they have made between 1 and 2 years after they registered, and appends it to a file retention.tsv.
In [10]:
head -n 5 retention.sh
Then we run it, although this WILL NOT WORK if you are not on ToolLabs. It also takes about 4 hours.
In [ ]:
bash retention.sh
Then we have to remove the header that mysql -e outputs with each query appended, so remove all lines that begin with u.
In [ ]:
grep -v '^u' retention.tsv