This notebook creates a graph representation of the collaboration between contributors of a Git repository, where nodes are authors, and edges are weighted by the parent/child dependencies between the commits of authors.


In [1]:
%matplotlib inline
from git_data import GitRepo
import matplotlib.pyplot as plt
import networkx as nx
import pandas as pd


Couldn't import dot_parser, loading of dot files will not be possible.

In [2]:
repo = GitRepo.GitRepo("..") #couldn't get get_repo to work, using the parent directory, bigbang itself, for data
full_info = repo.commit_data;

Nodes will be Author objects, each of which holds a list of Commit objects.


In [3]:
class Commit:
    def __init__(self, message, hexsha, parents):
        self.message = message
        self.hexsha = hexsha
        self.parents = parents
        
    def __repr__(self):
        return ' '.join(self.message.split(' ')[:4])

    
class Author:
    def __init__(self, name, commits):
        self.name = name
        self.commits = commits
        self.number_of_commits = 1
    
    def add_commit(self, commit):
        self.commits.append(commit)
        self.number_of_commits += 1
        
    def __repr__(self):
        return self.name

We create a list of authors, also separately keeping track of committer names to make sure we only add each author once. If a commit by an already stored author is found, we add it to that authors list of commits.


In [4]:
def get_authors():
    authors = []
    names = []

    for index, row in full_info.iterrows():
        name = row["Committer Name"]
        hexsha = row["HEXSHA"]
        parents = row["Parent Commit"]
        message = row["Commit Message"]

        if name not in names:
            authors.append(Author(name, [Commit(message, hexsha, parents)]))
            names.append(name)

        else:
            for author in authors:
                if author.name == name:
                    author.add_commit(Commit(message, hexsha, parents))

    return authors

We create our graph by forming an edge whenever an author has a commit which is the parent of another author's commit, and only increasing the weight of that edge if an edge between those two authors already exists.


In [5]:
def make_graph(nodes):
    G = nx.Graph()
    
    for author in nodes:
        for commit in author.commits:
            for other in nodes:
                for other_commit in other.commits:
                    if commit.hexsha in other_commit.parents:
                        if G.has_edge(author, other):
                            G[author][other]['weight'] += 1
                        else:
                            G.add_edge(author, other, weight = 1)
    
    return G

In [6]:
nodes = get_authors()
G = make_graph(nodes)

pos = nx.spring_layout(G)
nx.draw(G, pos, font_size=8, with_labels = False)
nx.draw_networkx_labels(G, pos);