Technical Debt Evaluation As Code

Abstract

Evolution of the processes supporting software development nearly always leads to the point when replacement of the manual activities with automation gets recognized as a best practice and frequently a mandatory step enabling and supporting more flexible and cost optimized development models. There are a few examples illustrating this statement:

Model driven development, automated code generation, and metaprogramming techniques assist in programming of boilerplate or repetitative pieces of code, work with the source code on the higher levels of abstraction, cuts on the programming efforts.
Environment setup and build automation, continuous integration decrease developers' time spent on the standard tasks of build - test - integrate development cycle and ensure their correctness and uniformity across the development team.
Infrastructure as code, deployment automation, scripting of the repetitative operations ensure stable, verifiable, and scalable operations processes.

The last example is especially illustrative. Invention of DevOps not only empowered infrastructure engineers with the programming language and API based automation tools but also enabled software engineers and developers to enter the area of operations implementing them with familiar programming languages, techniques, and patterns as if they are software products. This familiarity and similarity pulled down barriers between the two disciplines bringing considerable business value of development teams converting into devops teams.

We would like to determine what parts of the system should be addressed with the refactoring to optimize the refactoring value for the refactoring cost. There are steps to follow:

Compute metrics values of the parts of the system considered for refactoring by aggregation of the metrics for the elements included into these parts.
Deduce refactoring values and costs for these parts
Compare to find the optimal refactoring targets



In [2]:

    
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt



In [8]:

    
sample = pd.DataFrame({ 'A' : 1.,
                     'B' : pd.Timestamp('20130102'),
                     'C' : pd.Series(1,index=list(range(4)),dtype='float32'),
                     'D' : np.array([3] * 4,dtype='int32'),
                     'E' : pd.Categorical(["test","train","test","train"]),
                     'F' : 'foo' })
sample

First, let's define input data describing what parts the system is split into, what elements are in these parts, and volume and complexity values for these elements.



In [12]:

    
element_metrics = pd.DataFrame({
        "Part": pd.Categorical(["A", "A", "B", "C", "C"]),
        "Element": ["A::El1", "A::El2", "B::El3", "A::El2", "B::El3"],
        "Size": [30, 15, 50, 15, 50],
        "Complexity": [7, 3, 9, 3, 9]
    }, columns=["Part", "Element", "Size", "Complexity"])

element_metrics



In [17]:

    
part_metrics = element_metrics.groupby("Part").sum()
part_metrics



In [22]:

    
part_metrics.loc["A"]



In [42]:

    
refactoring_options = [list(o) for o in ["A", "ABC", "BC", "C"]]
refactoring_options









    Out[42]:





[['A'], ['A', 'B', 'C'], ['B', 'C'], ['C']]



In [48]:

    
part_metrics.loc[refactoring_options[2]].sum()









    Out[48]:





Size          115
Complexity     21
dtype: int64



In [ ]:

	A	B	C	D	E	F
0	1.0	2013-01-02	1.0	3	test	foo
1	1.0	2013-01-02	1.0	3	train	foo
2	1.0	2013-01-02	1.0	3	test	foo
3	1.0	2013-01-02	1.0	3	train	foo