Identifikation von Wissensinseln

Laden und Vorbereiten der Git-Log-Daten


In [1]:
import pandas as pd

log = pd.read_csv("../../../software-data/projects/linux/linux_blame_log.csv.gz")
log['timestamp'] = pd.to_datetime(log['timestamp'])
log.head()


Out[1]:
path author timestamp line
0 drivers/scsi/bfa/bfad_drv.h Anil Gurumurthy 2015-11-26 08:54:45 1
1 drivers/scsi/bfa/bfad_drv.h Anil Gurumurthy 2015-11-26 08:54:45 2
2 drivers/scsi/bfa/bfad_drv.h Anil Gurumurthy 2015-11-26 08:54:45 3
3 drivers/scsi/bfa/bfad_drv.h Jing Huang 2009-09-24 00:46:15 4
4 drivers/scsi/bfa/bfad_drv.h Anil Gurumurthy 2015-11-26 08:54:45 5

Gruppieren mit minimalen Zeitdauer und Zeilenanzahl


In [2]:
knowledge = log.groupby(
    ['path', 'author']).agg(
        {'timestamp':'min', 'line':'count'}
    )
knowledge.head()


Out[2]:
timestamp line
path author
arch/arc/kernel/time.c Anna-Maria Gleixner 2016-07-13 17:17:07 13
Daniel Lezcano 2016-06-15 12:50:12 31
Noam Camus 2016-01-01 10:18:49 18
Vineet Gupta 2013-01-18 09:42:18 243
Viresh Kumar 2015-07-16 11:26:14 6

Wissensanteile berechnen


In [3]:
knowledge['all'] = knowledge.groupby('path')['line'].transform('sum')
knowledge['knowing'] = knowledge['line'] / knowledge['all']
knowledge.head()


Out[3]:
timestamp line all knowing
path author
arch/arc/kernel/time.c Anna-Maria Gleixner 2016-07-13 17:17:07 13 311 0.041801
Daniel Lezcano 2016-06-15 12:50:12 31 311 0.099678
Noam Camus 2016-01-01 10:18:49 18 311 0.057878
Vineet Gupta 2013-01-18 09:42:18 243 311 0.781350
Viresh Kumar 2015-07-16 11:26:14 6 311 0.019293

Maximales Wissen pro Datei identifizieren


In [4]:
max_knowledge_per_file = knowledge.groupby(['path'])['knowing'].transform(max)
knowledge_carriers = knowledge[knowledge['knowing'] == max_knowledge_per_file]
knowledge_carriers = knowledge_carriers.reset_index(level=1)
knowledge_carriers.head()


Out[4]:
author timestamp line all knowing
path
arch/arc/kernel/time.c Vineet Gupta 2013-01-18 09:42:18 243 311 0.781350
arch/arm/common/timer-sp.c Rob Herring 2011-12-12 21:29:08 111 169 0.656805
arch/arm/include/asm/hardware/arm_timer.h Russell King 2010-01-16 15:07:08 24 29 0.827586
arch/arm/kernel/perf_event.c Jamie Iles 2010-02-02 19:25:44 176 523 0.336520
arch/arm/mach-at91/at91rm9200_time.c David Brownell 2007-07-31 00:41:26 81 95 0.852632

Export in D3 Visualisierung "Zoomable Circle Packing"


In [5]:
from ausi import d3
d3.create_json_for_zoomable_circle_packing(
    knowledge_carriers.reset_index(),
    'author',
    'author',
    'path',
    '/',
    'all',
    'knowing',
    'linux_circle_packing'
)


JSON file produced in 'C:\dev\repos\software-analytics\demos\20181213_EuregJUG_Aachen\linux_circle_packing.json'
HTML file produced in 'C:\dev\repos\software-analytics\demos\20181213_EuregJUG_Aachen\linux_circle_packing.html'