Technical Debt Evaluation As Code

Abstract

Evolution of the processes supporting software development nearly always leads to the point when replacement of the manual activities with automation gets recognized as a best practice and frequently a mandatory step enabling and supporting more flexible and cost optimized development models. There are a few examples illustrating this statement:

  • Model driven development, automated code generation, and metaprogramming techniques assist in programming of boilerplate or repetitative pieces of code, work with the source code on the higher levels of abstraction, cuts on the programming efforts.
  • Environment setup and build automation, continuous integration decrease developers' time spent on the standard tasks of build - test - integrate development cycle and ensure their correctness and uniformity across the development team.
  • Infrastructure as code, deployment automation, scripting of the repetitative operations ensure stable, verifiable, and scalable operations processes.

The last example is especially illustrative. Invention of DevOps not only empowered infrastructure engineers with the programming language and API based automation tools but also enabled software engineers and developers to enter the area of operations implementing them with familiar programming languages, techniques, and patterns as if they are software products. This familiarity and similarity pulled down barriers between the two disciplines bringing considerable business value of development teams converting into devops teams.

We would like to determine what parts of the system should be addressed with the refactoring to optimize the refactoring value for the refactoring cost. There are steps to follow:

  1. Compute metrics values of the parts of the system considered for refactoring by aggregation of the metrics for the elements included into these parts.
  2. Deduce refactoring values and costs for these parts
  3. Compare to find the optimal refactoring targets

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [8]:
sample = pd.DataFrame({ 'A' : 1.,
                     'B' : pd.Timestamp('20130102'),
                     'C' : pd.Series(1,index=list(range(4)),dtype='float32'),
                     'D' : np.array([3] * 4,dtype='int32'),
                     'E' : pd.Categorical(["test","train","test","train"]),
                     'F' : 'foo' })
sample


Out[8]:
A B C D E F
0 1.0 2013-01-02 1.0 3 test foo
1 1.0 2013-01-02 1.0 3 train foo
2 1.0 2013-01-02 1.0 3 test foo
3 1.0 2013-01-02 1.0 3 train foo

First, let's define input data describing what parts the system is split into, what elements are in these parts, and volume and complexity values for these elements.


In [12]:
element_metrics = pd.DataFrame({
        "Part": pd.Categorical(["A", "A", "B", "C", "C"]),
        "Element": ["A::El1", "A::El2", "B::El3", "A::El2", "B::El3"],
        "Size": [30, 15, 50, 15, 50],
        "Complexity": [7, 3, 9, 3, 9]
    }, columns=["Part", "Element", "Size", "Complexity"])

element_metrics


Out[12]:
Part Element Size Complexity
0 A A::El1 30 7
1 A A::El2 15 3
2 B B::El3 50 9
3 C A::El2 15 3
4 C B::El3 50 9

In [17]:
part_metrics = element_metrics.groupby("Part").sum()
part_metrics


Out[17]:
Size Complexity
Part
A 45 10
B 50 9
C 65 12

In [22]:
part_metrics.loc["A"]


Out[22]:
Size Complexity
Part
A 45 10

In [42]:
refactoring_options = [list(o) for o in ["A", "ABC", "BC", "C"]]
refactoring_options


Out[42]:
[['A'], ['A', 'B', 'C'], ['B', 'C'], ['C']]

In [48]:
part_metrics.loc[refactoring_options[2]].sum()


Out[48]:
Size          115
Complexity     21
dtype: int64

In [ ]: