Modularisierungscheck

Fragestellung

Frage

"Wie gut passt der fachliche Schnitt zur Entwicklungsaktivität?"

Idee

Heuristik: "Werden Änderungen innerhalb einer Komponente zusammengehörig vorgenommen?"

  • Änderungen => Commits aus Versionsverwaltung
  • Komponenten => Teil von Dateipfad

Datenimport

Git-Log importieren


In [14]:
from ozapfdis import git

git_log = git.log_numstat("../../../dropover/")[['sha', 'file']]
git_log.head()


Out[14]:
sha file
1 8c686954 backend/pom-2016-07-16_04-40-56-752.xml
4 97c6ef96 backend/src/test/java/at/dropover/scheduling/i...
6 3f7cf92c backend/src/main/webapp/app/widgets/gallery/js...
7 3f7cf92c backend/src/main/webapp/app/widgets/gallery/vi...
9 ec85fe73 backend/src/main/java/at/dropover/files/intera...

Nur reinen Java-Quellcode betrachten


In [16]:
prod_code = git_log.copy()
prod_code = prod_code[prod_code.file.str.contains("src/main/java")]
prod_code = prod_code[~prod_code.file.str.endswith("package-info.java")]
prod_code.head()


Out[16]:
sha file
9 ec85fe73 backend/src/main/java/at/dropover/files/intera...
5053 bfea33b8 backend/src/main/java/at/dropover/scheduling/i...
5066 ab9ad48e backend/src/main/java/at/dropover/scheduling/i...
5070 0732e9cb backend/src/main/java/at/dropover/files/intera...
5078 ba1fd215 backend/src/main/java/at/dropover/framework/co...

Analysis

Marker für erfolgten Commit setzen


In [5]:
prod_code['hit'] = 1
prod_code.head()


Out[5]:
sha file hit
9 ec85fe73 backend/src/main/java/at/dropover/files/intera... 1
5053 bfea33b8 backend/src/main/java/at/dropover/scheduling/i... 1
5066 ab9ad48e backend/src/main/java/at/dropover/scheduling/i... 1
5070 0732e9cb backend/src/main/java/at/dropover/files/intera... 1
5078 ba1fd215 backend/src/main/java/at/dropover/framework/co... 1

Tabelle drehen ("pivotieren")


In [6]:
commit_matrix = prod_code.reset_index().pivot_table(
    index='file',
    columns='sha',
    values='hit',
    fill_value=0)
commit_matrix.iloc[0:5,50:55]


Out[6]:
sha 3597d8a2 3b70ea7e 3d3be4ca 3e4ae692 429b3b32
file
backend/src/main/java/at/dropover/comment/boundary/AddCommentRequestModel.java 0 0 0 0 0
backend/src/main/java/at/dropover/comment/boundary/ChangeCommentRequestModel.java 0 0 0 1 0
backend/src/main/java/at/dropover/comment/boundary/CommentData.java 0 0 0 1 0
backend/src/main/java/at/dropover/comment/boundary/GetCommentRequestModel.java 0 0 0 0 0
backend/src/main/java/at/dropover/comment/boundary/GetCommentResponseModel.java 0 0 0 0 0

Abstand zwischen Vektoren berechnen


In [7]:
from sklearn.metrics.pairwise import cosine_distances

dissimilarity_matrix = cosine_distances(commit_matrix)
dissimilarity_matrix[:5,:5]


Out[7]:
array([[0.        , 0.29289322, 0.5       , 0.18350342, 0.29289322],
       [0.29289322, 0.        , 0.29289322, 0.1339746 , 0.5       ],
       [0.5       , 0.29289322, 0.        , 0.59175171, 0.29289322],
       [0.18350342, 0.1339746 , 0.59175171, 0.        , 0.42264973],
       [0.29289322, 0.5       , 0.29289322, 0.42264973, 0.        ]])

(Ergebnis schöner darstellen)


In [8]:
import pandas as pd
dissimilarity_df = pd.DataFrame(
    dissimilarity_matrix,
    index=commit_matrix.index,
    columns=commit_matrix.index)
dissimilarity_df.iloc[:5,:2]


Out[8]:
file backend/src/main/java/at/dropover/comment/boundary/AddCommentRequestModel.java backend/src/main/java/at/dropover/comment/boundary/ChangeCommentRequestModel.java
file
backend/src/main/java/at/dropover/comment/boundary/AddCommentRequestModel.java 0.000000 0.292893
backend/src/main/java/at/dropover/comment/boundary/ChangeCommentRequestModel.java 0.292893 0.000000
backend/src/main/java/at/dropover/comment/boundary/CommentData.java 0.500000 0.292893
backend/src/main/java/at/dropover/comment/boundary/GetCommentRequestModel.java 0.183503 0.133975
backend/src/main/java/at/dropover/comment/boundary/GetCommentResponseModel.java 0.292893 0.500000

Visualisierung

Reduzierung der Dimensionen


In [9]:
from sklearn.manifold import MDS

# uses a fixed seed for random_state for reproducibility
model = MDS(dissimilarity='precomputed', random_state=0)
dissimilarity_2d = model.fit_transform(dissimilarity_df)
dissimilarity_2d[:5]


Out[9]:
array([[-0.5259277 ,  0.45070158],
       [-0.56826041,  0.21528001],
       [-0.52746829,  0.34756761],
       [-0.55856713,  0.26202797],
       [-0.4036568 ,  0.49803657]])

(Ergebnis schöner darstellen)


In [10]:
dissimilarity_2d_df = pd.DataFrame(
    dissimilarity_2d,
    index=commit_matrix.index,
    columns=["x", "y"])
dissimilarity_2d_df.head()


Out[10]:
x y
file
backend/src/main/java/at/dropover/comment/boundary/AddCommentRequestModel.java -0.525928 0.450702
backend/src/main/java/at/dropover/comment/boundary/ChangeCommentRequestModel.java -0.568260 0.215280
backend/src/main/java/at/dropover/comment/boundary/CommentData.java -0.527468 0.347568
backend/src/main/java/at/dropover/comment/boundary/GetCommentRequestModel.java -0.558567 0.262028
backend/src/main/java/at/dropover/comment/boundary/GetCommentResponseModel.java -0.403657 0.498037

Module extrahieren


In [11]:
dissimilarity_2d_df['module'] = dissimilarity_2d_df.index.str.split("/").str[6].values
dissimilarity_2d_df.head()


Out[11]:
x y module
file
backend/src/main/java/at/dropover/comment/boundary/AddCommentRequestModel.java -0.525928 0.450702 comment
backend/src/main/java/at/dropover/comment/boundary/ChangeCommentRequestModel.java -0.568260 0.215280 comment
backend/src/main/java/at/dropover/comment/boundary/CommentData.java -0.527468 0.347568 comment
backend/src/main/java/at/dropover/comment/boundary/GetCommentRequestModel.java -0.558567 0.262028 comment
backend/src/main/java/at/dropover/comment/boundary/GetCommentResponseModel.java -0.403657 0.498037 comment

Interaktive Grafik erzeugen


In [12]:
from ausi import pygal
xy = pygal.create_xy_chart(dissimilarity_2d_df,"module")
xy.render_in_browser()


file://C:/Users/MARKUS~1/AppData/Local/Temp/tmpo9_m1oet.html

Ende