First, compute the number of pushes to github repositories aggregated per month, as described here. Perform the query first considering all repositories, the restricting the search to *.github.io repositories. The query restricted to *.github.io repositories is the following:
SELECT LEFT(created_at, 7) as month, COUNT(*) as pushes
FROM [githubarchive:github.timeline]
WHERE
type='CreateEvent' AND
RIGHT(repository_name, 10) = '.github.io'
GROUP BY month
ORDER BY month DESC
Then process the created CSV files with the following script, to make the data readable by NVD3:
In [65]:
import csv
import datetime
data = {}
for name in ['All', 'IO']:
with open('../data/push_monthly_%s_results-20140115.csv' % name, 'rb') as f:
data[name] = [x for x in csv.reader(f)][1:]
# Convert 'YYYY-MM' to unix epoch in milliseconds
times = [datetime.date(int(x[0].split('-')[0]), int(x[0].split('-')[1]), 1)- datetime.timedelta(days=1) for x in data[name]]
data[name] = zip(times, [x[1] for x in data[name]][1:])
# Compute the ratio between pushes to *.github.io repos vs total pushes, for each month
ratios = [(v[0][0].strftime('%s000'), float(v[0][1])/float(v[1][1])*100.0) for v in zip(data['IO'], data['All'])]
import json
output = []
output.append({'key': '% push to *.github.io vs. total pushes', 'values' : ratios})
with open('../data/githubIO.json', 'w') as f:
f.write(json.dumps(output))
Finally, place your HTML/JS code to display the data. In this case the code is placed in the _includes
directory so it can be loaded with the Jekyll directive:
{% include bladh/github_io_infographic.html %}
from any blog post. The HTML code to generate the graph is adapted from this example.
In [66]:
%%bash
cat ../_includes/bladh/github_io_infographic.html
In [ ]: