Refactoring legacy code
Lately my co-worker asked me to pair with him and make a code review with him of one of his projects. After having spent some hours on refactoring this legacy application together with my co-worker I want to share some important lessons learned. I call this application a legacy application because it has not been developed from grounds up by using TDD. Only at a very late stage of the project did my co-worker start to apply TDD.
I am a merciless person when called to do a code review. I can really be a pain in the ass with my questions of why this and why that. I love to point out violations of principles and best practices. But open minded people and myself always have a lot of fun when doing such things. And we both learn a LOT.
Proper Naming is crucial
The application I’m talking of has some GIS elements in it. One of the tasks in this application was to calculate the distance between two geographical coordinates. Let’s assume that a geographical coordinate on our earth is represented by the following value object
Where the longitude must be in the interval -180° to +180° and the latitude must be in the interval -90° to +90°.
Now the application contained a service called GeoCalcService that was used to calculate the distance between two geographical coordinates. The code is given below
Have a look at the name of the method! The method is called Distance. The problem with this name is that even my co-worker who has developed this code several weeks or months ago didn’t remember what exactly the method is returning. Is it returning the distance in kilometers or nautical miles or even in another unit of measurement? We had to consult the code to find out what really happens in this method.
How much better would it have been to just call the above method GetDistanceInNauticalMiles. No discussion or head scratching would have been necessary! It’s a small change but with a huge benefit. The readability of the code increases dramatically.
Put the logic where it belongs
The logic to calculate the distance between two geographical coordinates actually does not belong into a service. The GeoCoordinate object should contain the logic to calculate its distance from another point. So let’s move the logic from the service to the value object. And remember to properly name the method. Now our coordinate object is not only a data container but also has behavior.
Remember to put the logic where it makes most sense. Many developers keep their entities and value objects anemic. They are mere data containers and have no or very limited behavior. This is sub-optimal at best!
Make Value Objects Immutable
Another problem is that the GeoCoordinate class has writable properties. But a geographical coordinate in the context of our application is a value object. A value object should always be immutable. What does that mean for our code? We have to make all properties read only.
But wait a moment! This is not enough. Although no client can change the state of a GeoCoordinate the object itself would still be able to modify its own state internally. But this is also forbidden. So let’s refine the code as follows
Note that we have changed the auto properties to read-only properties with a read-only backing field.
Once we did this refactoring we had a huge problem. When compiling the application we accumulated nearly 100 compilation errors. These were all places all over the code base where properties of a geographical coordinate object where changed.
So we bit into the sour apple and went through the code base to fix all errors caused by the last refactoring. Although this was a boring exercise we managed to clean up the code significantly. During this exercise we added a clone function to the GeoCoordinate object
Wrap infrastructure that cannot be mocked
Unfortunately the .NET framework has not been developed with TDD in mind. Thus we have loads of infrastructure classes that cannot be (easily) mocked. One such example is the BinaryReader class. BinaryReader as an example does not implement an interface. In such a situation it’s often the best solution to wrap the corresponding class with another class that implements an interface. The interface contains exactly the methods needed from the wrapped class. In our case we defined the interface like this
and the wrapper class
The above class is really nothing spectacular. As it’s name says, it’s a pure wrapper class which just delegates calls to the inner reader. But it’s purpose is to decouple the client code from the infrastructure and make (unit-) testing easy.
Now any client code that directly used the BinaryReader class should now be changed to use a reader which implements the IDataReader interface.
A typical sample for such client code would be a mapper class that maps from one data structure to another. Instead of having a Map method that accepts a BinaryReader as a parameter we change the method such as that it accepts a reader of type IDataReader as parameter, i.e.
Now the method Map of the mapper class can easily be tested by mocking the reader parameter.
Apply the Single Responsibility Principle (SRP)
In the application we detected some code similar to this one
This code loads from a binary encoded file identified by its file name a bunch of geographical paths. The code is working correct up so far but it has two problems: first it is not covered by a unit test and second it violates the single responsibility principle (SRP). One responsibility is opening a file and accessing it with a binary reader. A second responsibility is the mapping of binary data to a GeoCoordinate object. So let’s tear this code apart and first implement a GeoCoordinateMapper class.
Of course the mapper class is developed by using TDD.
Now we modify the GeoPathHandler such as that it uses our new mapper class.
Note that the mapper class is provided to the handler via constructor injection.
Having separated concerns into distinct classes each class contains less code and can be understood and tested more easily. Each concern can be optimized individually without causing side effects to the other concerns.
It is worth mentioning that the total number of lines of code might even increase when applying the SRP. That’s not a problem since on the other hand each piece of code (or each class) get slimmer and more manageable.
Implement an Anti-Corruption Layer
Whenever you deal with an external system which is not under your control then protect your domain by putting an anti-corruption layer between your domain and the external system.
In our case the external system was the data provider. But the data was in a really exotic format. Now instead of importing this data and leave it in its exotic format and subsequently deal with it in the domain it would have been much much better to implement such an anti-corruption layer which in our case would have transformed the data into the format we need in the domain. A whole lot of confusion would have been avoided. Questions like “is this element in the format of the external system or does it have our format?” would have been completely avoided.
Don’t Betray Yourself
How many times did I hear the following sentence from my co-worker, from other developers and even out of my own mouth: “Yes I know it’s quick and dirty code but I cannot do it better now since I have no time or I am under pressure. It works and that’s important. I’ll cleanup the code later.“. Another statement I hear all the time is: “Yes I know there are no tests. But I did not have enough time to develop tests. I promise I’ll do it later on when I have less pressure.“
Every body who makes those statements is betraying himself! You’ll never come back and cleanup your code! You will always be under pressure. Messy code is an invitation to make it even worse when continuing to modify this code.
Missing tests will hurt you when you have to refactor your code. And you WILL have to refactor your code. The requirements change or you have to tune the code or you have to fix some bugs. What you have possibly gained by not writing a unit test you will loose ten times afterwards when coming back to this code to refactor it.
Refactoring legacy code can be an amusing task. I rather short pair-programming session has lead us through many interesting aspects of software development. We found many violations of principles and best practices. We found code that has no corresponding (unit-) tests. We found messy code and a lot of (member-) names that have absolute no meaning.