How Long Does It Take To Review an IPython Pull Request?

Fernando Perez wrote:

IPython: 8 minutes from bug on my box to reviewed/merged PR. We're doing something right... https://github.com/ipython/ipython/pull/4294

This got me thinking: how long does it usually take to do a review on the IPython project?



In [1]:

    
import pandas as pd
import json
import itertools
import collections
import re
import arrow
rcParams['figure.figsize'] = (15, 3)

I recently released a python package called git2json for parsing git logs. It isn't well written, but it does what it says on the tin.



In [2]:

    
!git2json --git-dir=/home/tavish/code/ipython/.git > ipython-log.json

We can parse the resulting JSON file and take a peek at the data structure.



In [3]:

    
log = json.load(open('ipython-log.json'))
print log[0]









    



{u'committer': {u'date': 1380325308, u'timezone': u'-0700', u'name': u'Fernando Perez', u'email': u'fernando.perez@berkeley.edu'}, u'author': {u'date': 1380325308, u'timezone': u'-0700', u'name': u'Fernando Perez', u'email': u'fernando.perez@berkeley.edu'}, u'tree': u'ea7fbb2694175423d776f649104ba42520db7cea', u'parents': [u'795c9e30ee8ad2e1a4e780c439fc50b565670e26', u'796a90bdeef377041c353019fdae19aaa72a76d3'], u'commit': u'b8295e9af6745b9c466d11bb31a6eef221e231c1', u'message': u"Merge pull request #4294 from minrk/tornado-2don't require tornado 3 in `--post serve`", u'changes': []}



In [4]:

    
print "Number of commits = ", len(log)









    



Number of commits =  12382

Before we answer the question posed at the beginning of this article, I'll answer some easier questions. First off: which files in IPython have been committed to the most? This is a simplified version of "code churn" which is reasonably effective for predicting bugs. (More complicated models include lines modified or take semantic differences into account). Since this is Friday night and I'm not trying to get a paper into ICSE, I'll just take the number of commits for each .py file.



In [5]:

    
file_changes = lambda: itertools.chain.from_iterable(
    [change[2] for change in commit['changes'] if re.match(r'^.*\.py$', change[2])]
    for commit in log
)



In [6]:

    
rcParams['figure.figsize'] = (15, 6)
fchanges = file_changes()
fchange_count = collections.Counter(fchanges)
a = average(fchange_count.values())
most_common = fchange_count.most_common(20)
df = pd.DataFrame(most_common)
df.head()
df.index = df[0]
df = df[[1]]
df.head()
p = df.plot(kind='bar', legend=False)
p.set_title('Most-changed Files in IPython')
p.set_ylabel('Commits')
# Draw a red line at the average
hlines(a, 0, len(df), colors='r')









    Out[6]:





<matplotlib.collections.LineCollection at 0x8f86f50>

Someone more familiar with IPython development will know how obvious the above graph is.

Next, I'll make a simple plot showing weekly commit counts over time, similar to the plots GitHub would show you. I'll create a data frame from a list in the format [(date_rounded_down_to_week, commit_id)] and then groupby() the date.



In [7]:

    
def weekly_date_resolution(ts):
    ar = arrow.arrow(ts)
    day_of_month = ar.timetuple().tm_mday
    week = int(day_of_month) / 7
    new_day = (week*7)+1
    assert new_day > 0
    assert new_day < 30
    try:
        day_adjusted = ar.replace(day=new_day)
    except ValueError:
        new_day = day_of_month # just keep the original
        day_adjusted = ar.replace(day=new_day)
    return day_adjusted.date()

commit_times = lambda: (
    (weekly_date_resolution(commit['committer']['date']), commit['commit'])
    for commit in log
)
dfct = pd.DataFrame(list(commit_times()), columns=['date', 'id'])
dfct = dfct.groupby('date').aggregate(len)
dfct.head()

A few more lines gives us a basic plot.



In [8]:

    
p = dfct.plot(legend=False)
p.set_title('Weekly commits on IPython')
p.set_ylabel('Commits')









    Out[8]:





<matplotlib.text.Text at 0x8f83b10>

Let's get back to the original question: how long does it usually take to do a review?

I'll make a distinction between two different kinds of commits:

Merge commits: The commit that merged the pull request branch into the main branch.
PR commits: The HEAD (or most recent) commit on the pull request's source branch.



In [9]:

    
def is_pull_request_merge(message):
    '''Match commit messages that start with "Merge pull request #"'''
    return re.match(r'^Merge pull request #', message)


def pull_request_number(message):
    '''Extract the PR # from the commit message.'''
    return int(re.match(r'^Merge pull request #(\d+)', message).groups()[0])


# Create a table for the merge commits.
merge_commits = [
    [
        commit['parents'][1], # the second parent seems to be the source branch
        commit['committer']['date'],
        pull_request_number(commit['message']),
    ]
    for commit in log if is_pull_request_merge(commit['message'])
]
# Also create a set of merge commits for use in the next cell.
merge_commit_parent_ids = [mc[0] for mc in merge_commits]
merge_df = pd.DataFrame(
    merge_commits,
    columns=['merge_commit', 'merge_timestamp', 'pr'],
    index=merge_commit_parent_ids,
)
merge_df.head()









    Out[9]:






  
    
      
      merge_commit
      merge_timestamp
      pr
    
  
  
    
      796a90bdeef377041c353019fdae19aaa72a76d3
       796a90bdeef377041c353019fdae19aaa72a76d3
       1380325308
       4294
    
    
      b0d6287174b62de9b89190b868317d12b5bc4ae4
       b0d6287174b62de9b89190b868317d12b5bc4ae4
       1380323422
       4270
    
    
      07b4962bd787068896f0e2b5dfb977441c7b391a
       07b4962bd787068896f0e2b5dfb977441c7b391a
       1380301596
       4278
    
    
      e51d4d021c57c846ff1ad439d20b3fcc6b1ebc77
       e51d4d021c57c846ff1ad439d20b3fcc6b1ebc77
       1380271407
       4282
    
    
      68b0f9d2d5f29c540f7873141cb09026c492d0ef
       68b0f9d2d5f29c540f7873141cb09026c492d0ef
       1380215023
       4279



In [10]:

    
# Create a table for pull request commits.
pr_commits = [
    [
        commit['commit'],
        commit['committer']['date'],
        len(commit['changes']), # let's sneak in another basic churn metric
    ]
    for commit in log
    # Do we have a merge commit for this commit?
    if commit['commit'] in merge_commit_parent_ids
]
commit_ids = [prc[0] for prc in pr_commits]
pr_df = pd.DataFrame(pr_commits, columns=['commit', 'commit_timestamp', 'churn'], index=commit_ids)
pr_df.head()









    Out[10]:






  
    
      
      commit
      commit_timestamp
      churn
    
  
  
    
      796a90bdeef377041c353019fdae19aaa72a76d3
       796a90bdeef377041c353019fdae19aaa72a76d3
       1380325245
       1
    
    
      e51d4d021c57c846ff1ad439d20b3fcc6b1ebc77
       e51d4d021c57c846ff1ad439d20b3fcc6b1ebc77
       1380162595
       1
    
    
      68b0f9d2d5f29c540f7873141cb09026c492d0ef
       68b0f9d2d5f29c540f7873141cb09026c492d0ef
       1380158115
       3
    
    
      07b4962bd787068896f0e2b5dfb977441c7b391a
       07b4962bd787068896f0e2b5dfb977441c7b391a
       1380153904
       1
    
    
      d4d6bc63d6c6dd34c5fe1270a90caec2e0208bfa
       d4d6bc63d6c6dd34c5fe1270a90caec2e0208bfa
       1380099263
       1

Now that we have these two data frames, we can merge them together.



In [11]:

    
both_df = pr_df.join(merge_df)
both_df.head()









    Out[11]:






  
    
      
      commit
      commit_timestamp
      churn
      merge_commit
      merge_timestamp
      pr
    
  
  
    
      796a90bdeef377041c353019fdae19aaa72a76d3
       796a90bdeef377041c353019fdae19aaa72a76d3
       1380325245
       1
       796a90bdeef377041c353019fdae19aaa72a76d3
       1380325308
       4294
    
    
      e51d4d021c57c846ff1ad439d20b3fcc6b1ebc77
       e51d4d021c57c846ff1ad439d20b3fcc6b1ebc77
       1380162595
       1
       e51d4d021c57c846ff1ad439d20b3fcc6b1ebc77
       1380271407
       4282
    
    
      68b0f9d2d5f29c540f7873141cb09026c492d0ef
       68b0f9d2d5f29c540f7873141cb09026c492d0ef
       1380158115
       3
       68b0f9d2d5f29c540f7873141cb09026c492d0ef
       1380215023
       4279
    
    
      07b4962bd787068896f0e2b5dfb977441c7b391a
       07b4962bd787068896f0e2b5dfb977441c7b391a
       1380153904
       1
       07b4962bd787068896f0e2b5dfb977441c7b391a
       1380301596
       4278
    
    
      d4d6bc63d6c6dd34c5fe1270a90caec2e0208bfa
       d4d6bc63d6c6dd34c5fe1270a90caec2e0208bfa
       1380099263
       1
       d4d6bc63d6c6dd34c5fe1270a90caec2e0208bfa
       1380149814
       4131

We can take the delta by subtracting the merge timestamp from the commit timestamp. This is the difference between the commit-under-review's timestamp and the merge timestamp, in seconds. We can do some basic math to get the difference on other scales. The HTML version of the table shows the results.



In [12]:

    
rcParams['figure.figsize'] = (5, 10)
both_df['delta'] = both_df['merge_timestamp'] - both_df['commit_timestamp']
delta = both_df['delta']
both_df['delta_weeks'] = delta/60.0/60.0/24.0/7.0
both_df['delta_days'] = delta/60.0/60.0/24.0
both_df['delta_hours'] = delta/60.0/60.0
both_df['delta_mins'] = delta/60.0
both_df['delta_secs'] = delta
both_df.head()









    Out[12]:






  
    
      
      commit
      commit_timestamp
      churn
      merge_commit
      merge_timestamp
      pr
      delta
      delta_weeks
      delta_days
      delta_hours
      delta_mins
      delta_secs
    
  
  
    
      796a90bdeef377041c353019fdae19aaa72a76d3
       796a90bdeef377041c353019fdae19aaa72a76d3
       1380325245
       1
       796a90bdeef377041c353019fdae19aaa72a76d3
       1380325308
       4294
           63
       0.000104
       0.000729
        0.017500
          1.050000
           63
    
    
      e51d4d021c57c846ff1ad439d20b3fcc6b1ebc77
       e51d4d021c57c846ff1ad439d20b3fcc6b1ebc77
       1380162595
       1
       e51d4d021c57c846ff1ad439d20b3fcc6b1ebc77
       1380271407
       4282
       108812
       0.179914
       1.259398
       30.225556
       1813.533333
       108812
    
    
      68b0f9d2d5f29c540f7873141cb09026c492d0ef
       68b0f9d2d5f29c540f7873141cb09026c492d0ef
       1380158115
       3
       68b0f9d2d5f29c540f7873141cb09026c492d0ef
       1380215023
       4279
        56908
       0.094094
       0.658657
       15.807778
        948.466667
        56908
    
    
      07b4962bd787068896f0e2b5dfb977441c7b391a
       07b4962bd787068896f0e2b5dfb977441c7b391a
       1380153904
       1
       07b4962bd787068896f0e2b5dfb977441c7b391a
       1380301596
       4278
       147692
       0.244200
       1.709398
       41.025556
       2461.533333
       147692
    
    
      d4d6bc63d6c6dd34c5fe1270a90caec2e0208bfa
       d4d6bc63d6c6dd34c5fe1270a90caec2e0208bfa
       1380099263
       1
       d4d6bc63d6c6dd34c5fe1270a90caec2e0208bfa
       1380149814
       4131
        50551
       0.083583
       0.585081
       14.041944
        842.516667
        50551

The first thing that stands out is (as of this writing) commit 796a90bdeef3770 has a delta of one minute. Fernando reported 8 minutes, which is much more plausible. This makes me wonder if some people have out-of-date clocks. Update: Fernando says that the review time was 1 minute; 8 minutes was from bug to pull request to commit.

Anyways, let's make a box plot. Box plots are fun.



In [13]:

    
rcParams['figure.figsize'] = (15, 3)
both_df[['delta_days']].boxplot(vert=False)









    Out[13]:





{'boxes': [<matplotlib.lines.Line2D at 0x9192fd0>],
 'caps': [<matplotlib.lines.Line2D at 0x91925d0>,
  <matplotlib.lines.Line2D at 0x9192ad0>],
 'fliers': [<matplotlib.lines.Line2D at 0x9196a10>,
  <matplotlib.lines.Line2D at 0x9196ed0>],
 'medians': [<matplotlib.lines.Line2D at 0x9196510>],
 'whiskers': [<matplotlib.lines.Line2D at 0x918ddd0>,
  <matplotlib.lines.Line2D at 0x9192050>]}

Based on this plot, we see that the IPython folks are pretty fast when it comes to reviewing and merging pull requests — the ones that get committed, anyways. There are some outliers, though: some take up to half a year before getting committed.

We can filter out all of these ranges that are larger than a month and ask: if a review takes less than a month, how long is it likely to take?



In [14]:

    
rcParams['figure.figsize'] = (15, 3)
both_df[both_df['delta_days'] < 30][['delta_days']].boxplot(vert=False)









    Out[14]:





{'boxes': [<matplotlib.lines.Line2D at 0x91e1710>],
 'caps': [<matplotlib.lines.Line2D at 0x91d8cd0>,
  <matplotlib.lines.Line2D at 0x91e1210>],
 'fliers': [<matplotlib.lines.Line2D at 0x91e4150>,
  <matplotlib.lines.Line2D at 0x91e4610>],
 'medians': [<matplotlib.lines.Line2D at 0x91e1c10>],
 'whiskers': [<matplotlib.lines.Line2D at 0x91d8510>,
  <matplotlib.lines.Line2D at 0x91d8750>]}

Based on this, it looks like most reviews happen within the first five days. The median is (I'm eyeballing this) around 1 day. Not bad at all.

If we look closely at the pull requests outside this range, we might see any number of things: pull requests that are large and hard to understand or pull requests that are blocked by some other task. I don't think pandas would help us much: we'd have to use our brains and look for patterns.

OK, one more question. Can we see any relationship between the review time and the number of files modified by the last pull request commit? This is a sad way to do it — since I'm not taking into account all commits on a pull request, I'm missing a lot of useful information — but it might be fun.



In [15]:

    
# The data is skewed, so i'll use the spearman correlation
both_df[['churn', 'delta']].corr(method='spearman')

There's almost no correlation, and the correlation that is there is probably just noise. Should we conclude that the number of files that you change in a pull request doesn't affect how long it will take to be reviewed? I don't think so, for two reasons:

I'm completely ignoring pull requests that get ignored or closed. My professor has looked at how long it takes patches to get accepted on large open source projects like linux and he found that on certain projects like Linux, large patches were ignored or rejected. No one had time to understand them and review them. This might be the case for IPython. (Does anybody from the IPython project think this is true?)
I'm only processing the tip of the pull request branch. A pull request can have many commits, and I should incorporate those into my churn metric.

I didn't really make any claims here, but let's pretend this is a real paper.

Threats to Validity

It is 1am on a Friday night.
The churn metric only represents the tip (ha) of the pull request iceberg.
This analysis doesn't look at how review time changed over time. Can we get a less skewed picture of pull request length by looking at only the period in which the project was funded?
This analysis doesn't look at actual review time — the pull request could have been made much later than when the pull request commit was made.

Conclusions

The IPython folks are very prompt. Send them a decent pull request and they'll probably merge it within a day.
The IPython Notebook is pretty cool.

Updates

A previous version of this notebook was incorrectly adding timezone differences to timestamps. After some experimentation, I've discovered that the timestamps provided by git2json don't need to be adjusted for timezone unless you want to display it in a human-readable format. Fixing this bug solved the "time travel" issue.

	id
date
2005-07-01	1
2005-07-15	6
2005-08-08	2
2005-08-15	4
2005-08-22	1

	merge_commit	merge_timestamp	pr
796a90bdeef377041c353019fdae19aaa72a76d3	796a90bdeef377041c353019fdae19aaa72a76d3	1380325308	4294
b0d6287174b62de9b89190b868317d12b5bc4ae4	b0d6287174b62de9b89190b868317d12b5bc4ae4	1380323422	4270
07b4962bd787068896f0e2b5dfb977441c7b391a	07b4962bd787068896f0e2b5dfb977441c7b391a	1380301596	4278
e51d4d021c57c846ff1ad439d20b3fcc6b1ebc77	e51d4d021c57c846ff1ad439d20b3fcc6b1ebc77	1380271407	4282
68b0f9d2d5f29c540f7873141cb09026c492d0ef	68b0f9d2d5f29c540f7873141cb09026c492d0ef	1380215023	4279

	commit	commit_timestamp	churn
796a90bdeef377041c353019fdae19aaa72a76d3	796a90bdeef377041c353019fdae19aaa72a76d3	1380325245	1
e51d4d021c57c846ff1ad439d20b3fcc6b1ebc77	e51d4d021c57c846ff1ad439d20b3fcc6b1ebc77	1380162595	1
68b0f9d2d5f29c540f7873141cb09026c492d0ef	68b0f9d2d5f29c540f7873141cb09026c492d0ef	1380158115	3
07b4962bd787068896f0e2b5dfb977441c7b391a	07b4962bd787068896f0e2b5dfb977441c7b391a	1380153904	1
d4d6bc63d6c6dd34c5fe1270a90caec2e0208bfa	d4d6bc63d6c6dd34c5fe1270a90caec2e0208bfa	1380099263	1