```
In [1]:
```%matplotlib inline
from bigbang.archive import Archive
from bigbang import repo_loader;
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

One of the newest features of BigBang is the ability to analyze git info for each project. For now, we mostly just look at commits over time. We can also analyze individual committers to run cohort visualization.

First, make sure that you've collected git and mail data. For now, we are looking at scipy, but you can analyze any git repo you'd like by loading its info. Below, we load the mail and git data into data tables.

```
In [2]:
```url = "http://mail.scipy.org/pipermail/scipy-dev/"
arx = Archive(url,archive_dir="../archives")
repo = repo_loader.get_repo("bigbang")
full_info = repo.commit_data;
act = arx.data.groupby("Date").size();
act = act.resample("D", how=np.sum)
act = act[act.index.year <= 2014]
act_week = act.resample("W", how=np.sum)

```
```

```
In [3]:
```print(full_info["Parent Commit"])

```
```

```
In [4]:
```fig = plt.figure(figsize=(10, 7.5));
commits_per_day = repo.commits_per_day()
commits_per_week = repo.commits_per_week()
commits_per_day.plot()
fig = plt.figure(figsize=(10, 7.5));
commits_per_week.plot()

```
Out[4]:
```

```
In [5]:
```fig = plt.figure(figsize=(10, 7.5));
simp = 5
convulation_array = [1.0/(simp) for n in range(simp)];
c_array = np.convolve(commits_per_week, convulation_array, "same")
e_array = np.convolve(act_week, convulation_array, "same");
plt.plot(act_week.index, e_array) # The Blue
plt.plot(commits_per_week.index, c_array) # The Green
fig.axes[0].xaxis_date()

```
```

```
In [6]:
```plt.figure(figsize=(10, 7.5));
df = repo.by_committer();
if (len(df > 20)):
df = df[len(df)-20:]
df.plot(kind="bar")

```
Out[6]:
```

```
In [7]:
```n = 5
import numpy as np
def first_commit_fn(df):
if (len(df) < 1):
return;
else:
return df
dataFrame = full_info
commits_by_time = dataFrame.groupby(["Committer Name", dataFrame['Time'].map(lambda x: x.toordinal()/100)], sort=True).size();
time = dataFrame.groupby(dataFrame['Time'].map(lambda x: x.toordinal()/100)).size().order();
first_commits = dataFrame.groupby("Committer Name").min().sort("Time");
commits_by_time = (commits_by_time.reindex(index = time.index.values, level=1, fill_value=0))
cohorts = np.array_split(first_commits, n);
convulation_array = [.1,.1,.1,.1,.1,.1,.1,.1,.1,.1];
cohort_activity = [(commits_by_time.loc[cohort.index.values].sum(None, False, 1, False)).reindex(index = time.index.values) for cohort in cohorts];
for i in range(len(cohort_activity)):
cohort_activity[i] = np.convolve(cohort_activity[i], convulation_array)
to_graph = pd.DataFrame(cohort_activity).transpose()
to_graph.plot(kind="bar",stacked=True, linewidth=0)

```
Out[7]:
```

```
In [8]:
```byCommitter = repo.by_committer();
totalCohortCommits = [];
for cohort in cohorts:
cohortPeople = byCommitter.reindex(cohort.index);
totalCohortCommits.append(cohortPeople.sum())
commitsPerCohort = pd.DataFrame(totalCohortCommits);
commitsPerCohort.transpose().plot(kind="bar")

```
Out[8]:
```

```
In [8]:
```

```
In [8]:
```