Refactoring legacy code


Lately my co-worker asked me to pair with him and make a code review with him of one of his projects. After having spent some hours on refactoring this legacy application together with my co-worker I want to share some important lessons learned. I call this application a legacy application because it has not been developed from grounds up by using TDD. Only at a very late stage of the project did my co-worker start to apply TDD.

I am a merciless person when called to do a code review. I can really be a pain in the ass with my questions of why this and why that. I love to point out violations of principles and best practices. But open minded people and myself always have a lot of fun when doing such things. And we both learn a LOT.

Proper Naming is crucial

The application I’m talking of has some GIS elements in it. One of the tasks in this application was to calculate the distance between two geographical coordinates. Let’s assume that a geographical coordinate on our earth is represented by the following value object

public class GeoCoordinate
{
    public double Longitude { get; set; }
    public double Latitude { get; set; }
 
    public GeoCoordinate(double longitude, double latitude)
    {
        Longitude = longitude;
        Latitude = latitude;
    }
}

Where the longitude must be in the interval -180° to +180° and the latitude must be in the interval -90° to +90°.

Now the application contained a service called GeoCalcService that was used to calculate the distance between two geographical coordinates. The code is given below

public class GeoCalcService : IGeoCalcService
{
    public double Distance(GeoCoordinate positionFrom, GeoCoordinate positionTo)
    {
        var R = Constants.cEarthRadiusKm;
        var dLat = (positionTo.Latitude - positionFrom.Latitude) * Constants.cDeg2Rad;
        var dLon = (positionTo.Longitude - positionFrom.Longitude) * Constants.cDeg2Rad;
        var a = Math.Sin(dLat / 2) * Math.Sin(dLat / 2) +
                Math.Cos(positionFrom.Latitude * Constants.cDeg2Rad) *
                Math.Cos(positionTo.Latitude * Constants.cDeg2Rad) *
                Math.Sin(dLon / 2) * Math.Sin(dLon / 2);
        var c = 2 * Math.Atan2(Math.Sqrt(a), Math.Sqrt(1 - a));
        var km = R * c;
 
        return km * Constants.cKm2Nm;
    }
}

Have a look at the name of the method! The method is called Distance. The problem with this name is that even my co-worker who has developed this code several weeks or months ago didn’t remember what exactly the method is returning. Is it returning the distance in kilometers or nautical miles or even in another unit of measurement? We had to consult the code to find out what really happens in this method.

How much better would it have been to just call the above method GetDistanceInNauticalMiles. No discussion or head scratching would have been necessary! It’s a small change but with a huge benefit. The readability of the code increases dramatically.

Put the logic where it belongs

The logic to calculate the distance between two geographical coordinates actually does not belong into a service. The GeoCoordinate object should contain the logic to calculate its distance from another point. So let’s move the logic from the service to the value object. And remember to properly name the method. Now our coordinate object is not only a data container but also has behavior.

public class GeoCoordinate
{
    // other code...
 
    public double GetDistanceInKilometersFrom(GeoCoordinate referencePoint)
    {
        var R = Constants.cEarthRadiusKm;
        var dLat = (referencePoint.Latitude - Latitude)*Constants.cDeg2Rad;
        var dLon = (referencePoint.Longitude - Longitude)*Constants.cDeg2Rad;
        var a = Math.Sin(dLat/2)*Math.Sin(dLat/2) +
                Math.Cos(Latitude*Constants.cDeg2Rad)*
                Math.Cos(referencePoint.Latitude*Constants.cDeg2Rad)*
                Math.Sin(dLon/2)*Math.Sin(dLon/2);
        var c = 2*Math.Atan2(Math.Sqrt(a), Math.Sqrt(1 - a));
        return R*c;
    }
 
    public double GetDistanceInNauticalMilesFrom(GeoCoordinate referencePoint)
    {
        return GetDistanceInKilometersFrom(referencePoint)*Constants.cKm2Nm;
    }
}

Remember to put the logic where it makes most sense. Many developers keep their entities and value objects anemic. They are mere data containers and have no or very limited behavior. This is sub-optimal at best!

Make Value Objects Immutable

Another problem is that the GeoCoordinate class has writable properties. But a geographical coordinate in the context of our application is a value object. A value object should always be immutable. What does that mean for our code? We have to make all properties read only.

public class GeoCoordinate
{
    public double Longitude { get; private set; }
    public double Latitude { get; private set; }
 
    // other code...
}

But wait a moment! This is not enough. Although no client can change the state of a GeoCoordinate the object itself would still be able to modify its own state internally. But this is also forbidden. So let’s refine the code as follows

public class GeoCoordinate
{
    private readonly double longitude;
    private readonly double latitude;
 
    public double Longitude { get { return longitude; } }
 
    public double Latitude { get { return latitude; } }
 
    public GeoCoordinate(double longitude, double latitude)
    {
        this.longitude = longitude;
        this.latitude = latitude;
    }
}

Note that we have changed the auto properties to read-only properties with a read-only backing field.

Once we did this refactoring we had a huge problem. When compiling the application we accumulated nearly 100 compilation errors. These were all places all over the code base where properties of a geographical coordinate object where changed.

So we bit into the sour apple and went through the code base to fix all errors caused by the last refactoring. Although this was a boring exercise we managed to clean up the code significantly. During this exercise we added a clone function to the GeoCoordinate object

public class GeoCoordinate
{
    // other code 
 
    public GeoCoordinate Clone()
    {
        return new GeoCoordinate(Longitude, Latitude);
    }
}

x

Wrap infrastructure that cannot be mocked

Unfortunately the .NET framework has not been developed with TDD in mind. Thus we have loads of infrastructure classes that cannot be (easily) mocked. One such example is the BinaryReader class. BinaryReader as an example does not implement an interface. In such a situation it’s often the best solution to wrap the corresponding class with another class that implements an interface. The interface contains exactly the methods needed from the wrapped class. In our case we defined the interface like this

public interface IDataReader
{
    string ReadString();
    decimal ReadDecimal();
    double ReadDouble();
    int ReadInt32();
    byte ReadByte();
    long Position { get; }
    long Count { get; }
}

and the wrapper class

public class WrappedBinaryReader : IDataReader
{
    private readonly BinaryReader reader;
 
    public WrappedBinaryReader(BinaryReader reader)
    {
        this.reader = reader;
    }
 
    public string ReadString()
    {
        return reader.ReadString();
    }
 
    public decimal ReadDecimal()
    {
        return reader.ReadDecimal();
    }
 
    public double ReadDouble()
    {
        return reader.ReadDouble();
    }
 
    public int ReadInt32()
    {
        return reader.ReadInt32();
    }
 
    public byte ReadByte()
    {
        return reader.ReadByte();
    }
 
    public long Position
    {
        get { return reader.BaseStream.Position; }
    }
 
    public long Length
    {
        get { return reader.BaseStream.Length; }
    }
}

The above class is really nothing spectacular. As it’s name says, it’s a pure wrapper class which just delegates calls to the inner reader. But it’s purpose is to decouple the client code from the infrastructure and make (unit-) testing easy.

Now any client code that directly used the BinaryReader class should now be changed to use a reader which implements the IDataReader interface.

A typical sample for such client code would be a mapper class that maps from one data structure to another. Instead of having a Map method that accepts a BinaryReader as a parameter we change the method such as that it accepts a reader of type IDataReader as parameter, i.e.

public class GeoCoordinateMapper
{
    public GeoCoordinate Map(IDataReader reader)
    {
        // code omitted for brevity
    }
}

Now the method Map of the mapper class can easily be tested by mocking the reader parameter.

Apply the Single Responsibility Principle (SRP)

In the application we detected some code similar to this one

public class GeoPathHandler
{
    public List<GeoCoordinate> ReadGeoPaths(string filename)
    {
        var list = new List<GeoCoordinate>();
        using (var fs = new FileStream(filename, FileMode.Open, FileAccess.Read, FileShare.Read))
        {
            using (var br = new BinaryReader(fs, Encoding.Unicode))
            {
                while (br.BaseStream.Position < br.BaseStream.Length)
                {
                    var count = br.ReadInt32();
                    for (var i = 0; i < count; i++)
                    {
                        var longitude = br.ReadDouble();
                        var latitude = br.ReadDouble();
                        var coordinate = new GeoCoordinate(longitude, latitude);
                        list.Add(coordinate);
                    }
 
                    // some more code (omitted for brevity)
                }
            }
        }
        return list;
    }
}

This code loads from a binary encoded file identified by its file name a bunch of geographical paths. The code is working  correct up so far but it has two problems: first it is not covered by a unit test and second it violates the single responsibility principle (SRP). One responsibility is opening a file and accessing it with a binary reader. A second responsibility is the mapping of binary data to a GeoCoordinate object. So let’s tear this code apart and first implement a GeoCoordinateMapper class.

public class GeoCoordinateMapper : IGeoCoordinateMapper
{
    public GeoCoordinate Map(IDataReader reader)
    {
        var longitude = reader.ReadDouble();
        var latitude = reader.ReadDouble();
        var coordinate = new GeoCoordinate(longitude, latitude);
        return coordinate;
    }
}

Of course the mapper class is developed by using TDD.

Now we modify the GeoPathHandler such as that it uses our new mapper class.

public class GeoPathHandler
{
    private readonly IGeoCoordinateMapper geoCoordinateMapper;
 
    public GeoPathHandler(IGeoCoordinateMapper geoCoordinateMapper)
    {
        this.geoCoordinateMapper = geoCoordinateMapper;
    }
 
    public List<GeoCoordinate> ReadGeoPaths(string filename)
    {
        var list = new List<GeoCoordinate>();
        using (var fs = new FileStream(filename, FileMode.Open, FileAccess.Read, FileShare.Read))
        {
            using (var binaryReader = new BinaryReader(fs, Encoding.Unicode))
            {
                var reader = new WrappedBinaryReader(binaryReader);
                while (reader.Position < reader.Length)
                {
                    var count = reader.ReadInt32();
                    for (var i = 0; i < count; i++)
                    {
                        var coordinate = geoCoordinateMapper.Map(reader);
                        list.Add(coordinate);
                    }
 
                    // some more code (omitted for brevity)
                }
            }
        }
        return list;
    }
}

Note that the mapper class is provided to the handler via constructor injection.

Having separated concerns into distinct classes each class contains less code and can be understood and tested more easily. Each concern can be optimized individually without causing side effects to the other concerns.

It is worth mentioning that the total number of lines of code might even increase when applying the SRP. That’s not a problem since on the other hand each piece of code (or each class) get slimmer and more manageable.

Implement an Anti-Corruption Layer

Whenever you deal with an external system which is not under your control then protect your domain by putting an anti-corruption layer between your domain and the external system.

In our case the external system was the data provider. But the data was in a really exotic format. Now instead of importing this data and leave it in its exotic format and subsequently deal with it in the domain it would have been much much better to implement such an anti-corruption layer which in our case would have transformed the data into the format we need in the domain. A whole lot of confusion would have been avoided. Questions like “is this element in the format of the external system or does it have our format?” would have been completely avoided.

Don’t Betray Yourself

How many times did I hear the following sentence from my co-worker, from other developers and even out of my own mouth: “Yes I know it’s quick and dirty code but I cannot do it better now since I have no time or I am under pressure. It works and that’s important. I’ll cleanup the code later.“. Another statement I hear all the time is: “Yes I know there are no tests. But I did not have enough time to develop tests. I promise I’ll do it later on when I have less pressure.

Every body who makes those statements is betraying himself! You’ll never come back and cleanup your code! You will always be under pressure. Messy code is an invitation to make it even worse when continuing to modify this code.

Missing tests will hurt you when you have to refactor your code. And you WILL have to refactor your code. The requirements change or you have to tune the code or you have to fix some bugs. What you have possibly gained by not writing a unit test you will loose ten times afterwards when coming back to this code to refactor it.

Summary

Refactoring legacy code can be an amusing task. I rather short pair-programming session has lead us through many interesting aspects of software development. We found many violations of principles and best practices. We found code that has no corresponding (unit-) tests. We found messy code and a lot of (member-) names that have absolute no meaning.

The open closed principle