Linux Git-Repository Analyse

Datenimport

Import der Git-Log-Daten von Linux


In [6]:
import pandas as pd

git_log = pd.read_csv("../dataset/git_demo_timestamp_linux.gz")
git_log.head()


Out[6]:
timestamp author
0 2017-12-31 14:47:43 Linus Torvalds
1 2017-12-31 13:13:56 Linus Torvalds
2 2017-12-31 13:03:05 Linus Torvalds
3 2017-12-31 12:30:34 Linus Torvalds
4 2017-12-31 12:29:02 Linus Torvalds

TOP-10-Committer

Liste der Top-10-Committer


In [7]:
top10 = git_log.author.value_counts().head(10)
top10


Out[7]:
Linus Torvalds           24259
David S. Miller           9563
Mark Brown                6917
Takashi Iwai              6293
Al Viro                   6064
H Hartley Sweeten         5942
Ingo Molnar               5462
Mauro Carvalho Chehab     5384
Arnd Bergmann             5305
Greg Kroah-Hartman        4687
Name: author, dtype: int64

Visualisierung der TOP-10-Committer


In [12]:
%matplotlib inline
top10.plot.pie(title="TOP 10 committer", label="", figsize=[7,7])


Out[12]:
<matplotlib.axes._subplots.AxesSubplot at 0x21281c339b0>

Analyse bevorzugten Commit-Zeiten

Umwandlung der Zeitstempel-Spalte


In [9]:
git_log.timestamp = pd.to_datetime(git_log.timestamp)
git_log.head()


Out[9]:
timestamp author
0 2017-12-31 14:47:43 Linus Torvalds
1 2017-12-31 13:13:56 Linus Torvalds
2 2017-12-31 13:03:05 Linus Torvalds
3 2017-12-31 12:30:34 Linus Torvalds
4 2017-12-31 12:29:02 Linus Torvalds

Darstellung der häufigsten Commits nach Tageszeiten


In [10]:
git_log.timestamp.dt.hour.value_counts(sort=False).plot.bar()


Out[10]:
<matplotlib.axes._subplots.AxesSubplot at 0x21281b06748>

Demo Ende