Fernando Perez wrote:
IPython: 8 minutes from bug on my box to reviewed/merged PR. We're doing something right... https://github.com/ipython/ipython/pull/4294
This got me thinking: how long does it usually take to do a review on the IPython project?
In [1]:
import pandas as pd
import json
import itertools
import collections
import re
import arrow
rcParams['figure.figsize'] = (15, 3)
I recently released a python package called git2json for parsing git logs. It isn't well written, but it does what it says on the tin.
In [2]:
!git2json --git-dir=/home/tavish/code/ipython/.git > ipython-log.json
We can parse the resulting JSON file and take a peek at the data structure.
In [3]:
log = json.load(open('ipython-log.json'))
print log[0]
In [4]:
print "Number of commits = ", len(log)
Before we answer the question posed at the beginning of this article, I'll answer some easier questions. First off: which files in IPython have been committed to the most? This is a simplified version of "code churn" which is reasonably effective for predicting bugs. (More complicated models include lines modified or take semantic differences into account). Since this is Friday night and I'm not trying to get a paper into ICSE, I'll just take the number of commits for each .py
file.
In [5]:
file_changes = lambda: itertools.chain.from_iterable(
[change[2] for change in commit['changes'] if re.match(r'^.*\.py$', change[2])]
for commit in log
)
In [6]:
rcParams['figure.figsize'] = (15, 6)
fchanges = file_changes()
fchange_count = collections.Counter(fchanges)
a = average(fchange_count.values())
most_common = fchange_count.most_common(20)
df = pd.DataFrame(most_common)
df.head()
df.index = df[0]
df = df[[1]]
df.head()
p = df.plot(kind='bar', legend=False)
p.set_title('Most-changed Files in IPython')
p.set_ylabel('Commits')
# Draw a red line at the average
hlines(a, 0, len(df), colors='r')
Out[6]:
Someone more familiar with IPython development will know how obvious the above graph is.
Next, I'll make a simple plot showing weekly commit counts over time, similar to the plots GitHub would show you. I'll create a data frame from a list in the format [(date_rounded_down_to_week, commit_id)]
and then groupby()
the date.
In [7]:
def weekly_date_resolution(ts):
ar = arrow.arrow(ts)
day_of_month = ar.timetuple().tm_mday
week = int(day_of_month) / 7
new_day = (week*7)+1
assert new_day > 0
assert new_day < 30
try:
day_adjusted = ar.replace(day=new_day)
except ValueError:
new_day = day_of_month # just keep the original
day_adjusted = ar.replace(day=new_day)
return day_adjusted.date()
commit_times = lambda: (
(weekly_date_resolution(commit['committer']['date']), commit['commit'])
for commit in log
)
dfct = pd.DataFrame(list(commit_times()), columns=['date', 'id'])
dfct = dfct.groupby('date').aggregate(len)
dfct.head()
Out[7]:
A few more lines gives us a basic plot.
In [8]:
p = dfct.plot(legend=False)
p.set_title('Weekly commits on IPython')
p.set_ylabel('Commits')
Out[8]:
Let's get back to the original question: how long does it usually take to do a review?
I'll make a distinction between two different kinds of commits:
HEAD
(or most recent) commit on the pull request's source branch.
In [9]:
def is_pull_request_merge(message):
'''Match commit messages that start with "Merge pull request #"'''
return re.match(r'^Merge pull request #', message)
def pull_request_number(message):
'''Extract the PR # from the commit message.'''
return int(re.match(r'^Merge pull request #(\d+)', message).groups()[0])
# Create a table for the merge commits.
merge_commits = [
[
commit['parents'][1], # the second parent seems to be the source branch
commit['committer']['date'],
pull_request_number(commit['message']),
]
for commit in log if is_pull_request_merge(commit['message'])
]
# Also create a set of merge commits for use in the next cell.
merge_commit_parent_ids = [mc[0] for mc in merge_commits]
merge_df = pd.DataFrame(
merge_commits,
columns=['merge_commit', 'merge_timestamp', 'pr'],
index=merge_commit_parent_ids,
)
merge_df.head()
Out[9]:
In [10]:
# Create a table for pull request commits.
pr_commits = [
[
commit['commit'],
commit['committer']['date'],
len(commit['changes']), # let's sneak in another basic churn metric
]
for commit in log
# Do we have a merge commit for this commit?
if commit['commit'] in merge_commit_parent_ids
]
commit_ids = [prc[0] for prc in pr_commits]
pr_df = pd.DataFrame(pr_commits, columns=['commit', 'commit_timestamp', 'churn'], index=commit_ids)
pr_df.head()
Out[10]:
Now that we have these two data frames, we can merge them together.
In [11]:
both_df = pr_df.join(merge_df)
both_df.head()
Out[11]:
We can take the delta by subtracting the merge timestamp from the commit timestamp. This is the difference between the commit-under-review's timestamp and the merge timestamp, in seconds. We can do some basic math to get the difference on other scales. The HTML version of the table shows the results.
In [12]:
rcParams['figure.figsize'] = (5, 10)
both_df['delta'] = both_df['merge_timestamp'] - both_df['commit_timestamp']
delta = both_df['delta']
both_df['delta_weeks'] = delta/60.0/60.0/24.0/7.0
both_df['delta_days'] = delta/60.0/60.0/24.0
both_df['delta_hours'] = delta/60.0/60.0
both_df['delta_mins'] = delta/60.0
both_df['delta_secs'] = delta
both_df.head()
Out[12]:
The first thing that stands out is (as of this writing) commit 796a90bdeef3770
has a delta of one minute. Fernando reported 8 minutes, which is much more plausible. This makes me wonder if some people have out-of-date clocks. Update: Fernando says that the review time was 1 minute; 8 minutes was from bug to pull request to commit.
Anyways, let's make a box plot. Box plots are fun.
In [13]:
rcParams['figure.figsize'] = (15, 3)
both_df[['delta_days']].boxplot(vert=False)
Out[13]:
Based on this plot, we see that the IPython folks are pretty fast when it comes to reviewing and merging pull requests — the ones that get committed, anyways. There are some outliers, though: some take up to half a year before getting committed.
We can filter out all of these ranges that are larger than a month and ask: if a review takes less than a month, how long is it likely to take?
In [14]:
rcParams['figure.figsize'] = (15, 3)
both_df[both_df['delta_days'] < 30][['delta_days']].boxplot(vert=False)
Out[14]:
Based on this, it looks like most reviews happen within the first five days. The median is (I'm eyeballing this) around 1 day. Not bad at all.
If we look closely at the pull requests outside this range, we might see any number of things: pull requests that are large and hard to understand or pull requests that are blocked by some other task. I don't think pandas would help us much: we'd have to use our brains and look for patterns.
OK, one more question. Can we see any relationship between the review time and the number of files modified by the last pull request commit? This is a sad way to do it — since I'm not taking into account all commits on a pull request, I'm missing a lot of useful information — but it might be fun.
In [15]:
# The data is skewed, so i'll use the spearman correlation
both_df[['churn', 'delta']].corr(method='spearman')
Out[15]:
There's almost no correlation, and the correlation that is there is probably just noise. Should we conclude that the number of files that you change in a pull request doesn't affect how long it will take to be reviewed? I don't think so, for two reasons:
I didn't really make any claims here, but let's pretend this is a real paper.
A previous version of this notebook was incorrectly adding timezone differences to timestamps. After some experimentation, I've discovered that the timestamps provided by git2json
don't need to be adjusted for timezone unless you want to display it in a human-readable format. Fixing this bug solved the "time travel" issue.