Refactoring

A pattern for evolution of code

Scott Hendrickson
2016 April 29

What is refactoring?

Code refactoring is the process of restructuring existing computer code—changing the factoring—without changing its external behavior. - wikipedia

Maybe an illustration about two work streams.

  1. "Normal" coding work. You are trying to add new functionality to your code. E.g.
  2. Changing the code, structure, architecture, etc. (but not the functionality).
  3. Know when and how to move between the two activities

More wikipedia,

Refactoring improves nonfunctional attributes of the software. Advantages include improved code readability and reduced complexity; these can improve source-code maintainability and create a more expressive internal architecture or object model to improve extensibility. Typically, refactoring applies a series of standardised basic micro-refactorings, each of which is (usually) a tiny change in a computer program's source code that either preserves the behaviour of the software, or at least does not modify its conformance to functional requirements.

Why worry about it?

Refactoring enables graceful code evolution -- You don't have the know all the answers when you start, you can discover them as you go. Refactoring is a key part of the how.

... code refactoring may also resolve hidden, dormant, or undiscovered computer bugs or vulnerabilities in the system by simplifying the underlying logic and eliminating unnecessary levels of complexity.

What motivates refactoring?

Sometimes we have a distinct feel that something isn't quite right. Or, we hit a wall where someone asks us to add some functionality to the code and we realize that code choices we made earlier make adding the new piece very hard, risky or clunky.

More formally, smart people have identified some common "this doesn't feel right" moments and cateloged them.

Bad Smells

Refactoring is usually motivated by noticing a code smell.[2] For example the method at hand may be very long, or it may be a near duplicate of another nearby method. Once recognized, such problems can be addressed by refactoring the source code, or transforming it into a new form that behaves the same as before but that no longer "smells". 

These bad smells can be at code, architecture, data structure, etc. levels. Learning to identify many types of bad smells and then having ready tools to address them is part of what we mean by "professional developer skills."

For a long routine, one or more smaller subroutines can be extracted; or for duplicate routines, the duplication can be removed and replaced with one shared function. Failure to perform refactoring can result in accumulating technical debt; on the other hand, refactoring is one of the primary means of repaying technical debt.[3]

When you don't refactor regularly...

...you accumlulate technical debt.

Can we say any generally useful things about the risks of not paying down technical debt?

There are two general categories of benefits to the activity of refactoring:

  1. Maintainability. It is easier to fix bugs because the source code is easy to read and the intent of its author is easy to grasp.[4] This might be achieved by reducing large monolithic routines into a set of individually concise, well-named, single-purpose methods. It might be achieved by moving a method to a more appropriate class, or by removing misleading comments.
  2. Extensibility. It is easier to extend the capabilities of the application if it uses recognizable design patterns, and it provides some flexibility where none before may have existed.[1] - wikipedia

What is a design pattern?

A design pattern is the re-usable form of a solution to a design problem. The idea was introduced by the architect Christopher Alexander[1] and has been adapted for various other disciplines, most notably computer science.[2] - wikipedia

Can you give an example, maybe from Alexander's work?

... how the components of the pattern relate to each other to give the solution.[3] Christopher Alexander describes common design problems as arising from "conflicting forces" — such as the conflict between wanting a room to be sunny and wanting it not to overheat on summer afternoons. A pattern would not tell the designer how many windows to put in the room; instead, it would propose a set of values to guide the designer toward a decision that is best for their particular application. Alexander, for example, suggests that enough windows should be included to direct light all around the room. He considers this a good solution because he believes it increases the enjoyment of the room by its occupants. Other authors might come to different conclusions, if they place higher value on heating costs, or material costs. These values, used by the pattern's author to determine which solution is "best", must also be documented within the pattern. - wikipedia

Important attributes of a Pattern

The elements of this language are entities called patterns. Each pattern describes a problem that occurs over and over again in our environment, and then describes the core of the solution to that problem, in such a way that you can use this solution a million times over, without ever doing it the same way twice. — Christopher Alexander[1] - wikipedia

So, summarizing attributes:

  1. The Context section sets the stage where the pattern takes place.
  2. The Problem section explains what the actual problem is.
  3. The Forces section describes why the problem is difficult to solve.
  4. The Solution section explains the solution in detail.
  5. The Consequences section demonstrates what happens when you apply the solution.

(These steps follow http://www.europlop.net/sites/default/files/files/0_How%20to%20write%20a%20pattern-2011-11-30_linked.pdf )

  1. People living, working, playing near water. Desire to be near water.
  2. Desire for access water by many people at once --> reduced access to water
  3. Identify the forces
    • Desire - private land on water
    • Water adjacent land = common access land
    • Roads parallel to water aren't the kind of access we mean
  4. Form of solution and considerations
    • Common land along water
    • Approach roads perpendicular
  5. Preserving some common areas near water provide broad access. Preserving common waters-edge space my building roads perpendicular to water maximized common areas.

WRT bad smells, give a more code-y example


In [1]:
import string, math
a = { x:[] for x in string.punctuation }
i = 0
for x in a:
    a[x].append(math.sin(i))
    i += 1
print(a)


{'!': [0.0], '#': [0.8414709848078965], '"': [0.9092974268256817], '%': [0.1411200080598672], '$': [-0.7568024953079282], "'": [-0.9589242746631385], '&': [-0.27941549819892586], ')': [0.6569865987187891], '(': [0.9893582466233818], '+': [0.4121184852417566], '*': [-0.5440211108893699], '-': [-0.9999902065507035], ',': [-0.5365729180004349], '/': [0.4201670368266409], '.': [0.9906073556948704], ';': [0.6502878401571169], ':': [-0.2879033166650653], '=': [-0.9613974918795568], '<': [-0.750987246771676], '?': [0.14987720966295234], '>': [0.9129452507276277], '@': [0.836655638536056], '[': [-0.008851309290403876], ']': [-0.8462204041751706], '\\': [-0.9055783620066239], '_': [-0.13235175009777303], '^': [0.7625584504796028], '`': [0.956375928404503], '{': [0.27090578830786904], '}': [-0.6636338842129675], '|': [-0.9880316240928618], '~': [-0.404037645323065]}

This activity has a name, we call it enumeration...


In [2]:
import string
a = { x:[] for x in string.punctuation }
for i, x in enumerate(a):
    a[x].append(math.sin(i))
print(a)


{'!': [0.0], '#': [0.8414709848078965], '"': [0.9092974268256817], '%': [0.1411200080598672], '$': [-0.7568024953079282], "'": [-0.9589242746631385], '&': [-0.27941549819892586], ')': [0.6569865987187891], '(': [0.9893582466233818], '+': [0.4121184852417566], '*': [-0.5440211108893699], '-': [-0.9999902065507035], ',': [-0.5365729180004349], '/': [0.4201670368266409], '.': [0.9906073556948704], ';': [0.6502878401571169], ':': [-0.2879033166650653], '=': [-0.9613974918795568], '<': [-0.750987246771676], '?': [0.14987720966295234], '>': [0.9129452507276277], '@': [0.836655638536056], '[': [-0.008851309290403876], ']': [-0.8462204041751706], '\\': [-0.9055783620066239], '_': [-0.13235175009777303], '^': [0.7625584504796028], '`': [0.956375928404503], '{': [0.27090578830786904], '}': [-0.6636338842129675], '|': [-0.9880316240928618], '~': [-0.404037645323065]}

Are smells at code level the only type?

No. An older example from phone switching. When a phone switch is saturated, there are no available outgoing lines to connect. Then, there is a queue of calls waiting to connect. After waiting a given amount of time, waiting users give up (or get a busy signal) and drop their request to connect. Often the try their call again.

Forces:

  • The likelihood of dropping goes up quickly with time
  • Dropping and calling again uses switch resoruces without successfully connecting calls
  • Dropping calls decreases customer satisfaction
  • When there is a backlog of calls, adding new calls at the end of the line means that, even when there is a new appropriate outgoing line avaialble, all of the other (potentially dropped) call have to be tried first.

Pattern of solution:

  • When stale work multiplies failures, do fresh work first.

Consequences:

  • This is the seemingly unfair plan: last-come, first served. But can show this minimizes overall dissappointment for phone switching.

And you have seen a similar pattern play out with someone standing at a help desk while someone else calls in... Take the fresh work first because you already captured the attention of the person standing there.

Also, fresh work first is the principle behind "secure your own mask before helping someone else" on the airplane.

Coder's Habit Pattern

I want my code to be really fast so I am optimizing every method as i go.

This creates unreadable code, typically adds a lot of complexity and wastes a lot of time optimizing code that isn't the bottleneck of performance.

The smell is premature optimization.

The pattern of behavior is "delay optimization" until you can profile the running code to find the bottle neck.

Consequences are you will add complexity, make all of your code less readable and use all of your attention on parts of the code that are not the bottleneck for performance.

Note: Pattern Language

An organized collection of design patterns that relate to a particular field is called a pattern language. This language gives a common terminology for discussing the situations designers are faced with. - wikipedia

The original for Software was, "Design Patterns", also known referred to as "GOF" for the Gang of Four, referring to the authors.

What are the pre-conditions for refactoring?

Before refactoring a section of code, a solid set of automatic unit tests is needed. The tests are used to demonstrate that the behavior of the module is correct before the refactoring. - wikipedia

Without unit tests, refactoring is slow and risky.

If it inadvertently turns out that a test fails, then it's generally best to fix the test first, because otherwise it is hard to distinguish between failures introduced by refactoring and failures that were already there. After the refactoring, the tests are run again to verify the refactoring didn't break the tests. - wikipedia

What is a unit test?

Unit testing is a software testing method by which individual units of source code, sets of one or more computer program modules together with associated control data, usage procedures, and operating procedures, are tested to determine whether they are fit for use.[1] 

Intuitively, one can view a unit as the smallest testable part of an application. In procedural programming, a unit could be an entire module, but it is more commonly an individual function or procedure. In object-oriented programming, a unit is often an entire interface, such as a class, but could be an individual method.[2]

Ideally, each test case is independent from the others. Substitutes such as method stubs, mock objects,[5] fakes, and test harnesses can be used to assist testing a module in isolation. Unit tests are typically written and run by software developers to ensure that code meets its design and behaves as intended. - wikipedia

How to identify Bad Smells?

https://sourcemaking.com/refactoring/smells

What to do about them? Is there a catelog of refactoring?

http://refactoring.com/catalog/

Some of this seems pretty standard for Python

https://github.com/python-rope/rope

And a vi plugin...


In [ ]: