A Refactoring: Explicit Modeling And Reducing Duplication

A coworker showed me a method that had a series of guard clauses at the top and a series of sequential steps that had to be processed after that. All of the guard clauses and sequential steps were executed the same way – call a method, get the result, check the result status, return if errors have occurred.

Here’s a mock-up of what the original code he showed me looked like:

   1: public Result DoStuff(Something something, Whatever whatever, WhoCares whocares, DoesntMatter doesntMatter)

   2: {

   3:     Result result;

   4:     

   5:     if ((result = GuardClause1(something) != null)

   6:         return result;

   7:         

   8:     if ((result = GuardClause2(whatever) != null)

   9:         return result;

  10:     

  11:     if ((result = Step1(whoCares) != null)

  12:         return result;

  13:  

  14:     result = Step2(doesntMatter);

  15:     

  16:     return result;

  17: }

He had me looking at the code so that I could make suggestions on how to improve. He already recognized the code as being less-than-ideal, but wanted to see what my reactions and suggestions were. I had several reactions when I saw it:

  • Implicit modeling of error status with nulls is a bad idea
  • Too much duplication of code
  • Too many return statements. Try to get this down to 1 return (though I generally make exceptions for guard clauses)
  • Overall, difficult to read, understand and really know what is happening

I’ve covered most of these items before, and I don’t want to repeat the same information. However, I will link back to the places where I’ve talked about them. What I really want to do is walk through the process of refactoring the code from the original example above, into an example that takes advantage of .Net’s delegates to provide code that is much more declarative in nature, and much easier to understand and work with.

 

Implicit vs Explicit Modeling

I’ve already talked about explicitly modeling in your code, so I won’t repeat all of that, here. You can go read those previous posts for my general opinion on the subject (which is to say, you should explicitly model your state instead of using nulls). In this particular instance, I suggested using a .Status on the Result class to indicate whether or not the step completed successfully or not.

   1: public Result DoStuff(Something something, Whatever whatever, WhoCares whocares, DoesntMatter doesntMatter)

   2: {

   3:     Result result;

   4:     

   5:     result = GuardClause1(something);

   6:     if (result.Status == ResultStatus.Error)

   7:         return result;

   8:         

   9:     result = GuardClause2(whatever);

  10:     if (result.Status == ResultStatus.Error)

  11:         return result;

  12:     

  13:     result = Step1(whoCares);

  14:     if (result.Status == ResultStatus.Error)

  15:         return result;

  16:         

  17:     result = Step2(doesntMatter);

  18:     

  19:     return result;

  20: }

This is a pretty simple change, but it has the adverse affect of creating more code duplication: the if statement checking the status is now the same after each call.

 

Reducing Duplication

I’ve also talked about the use of Delegates to help reduce code duplication before. In this particular case, original code used the same if-then statement, with the exception of the value that was assigned to ‘result’. The first refactoring shows us that the if-then statement actually is the same one, over and over again. The only thing that we need to vary is the method that is executed to obtain the result. With .Net’s delegates – in particular, the Func<T> delegate – we can easily do that. However, there is a small gotcha to this. If we extracted the code into a method that takes a single Func<Result> as the parameter, then we don’t gain anything. We won’t actually reduce the duplication. In fact, this would only make the code worse:

   1: public Result DoStuff(Something something, Whatever whatever, WhoCares whocares, DoesntMatter doesntMatter)

   2: {

   3:     Result result;

   4:     

   5:     result = ProcessStep(() => GuardClause1(something));

   6:     if (result.Status == ResultStatus.Error)

   7:         return result;

   8:         

   9:     // ... other steps here

  10:         

  11:     return result;

  12: }

  13:  

  14: public Result ProcessStep(Func<Result> step)

  15: {

  16:     return step();    

  17: }

Since the DoStuff method has to return the result if there was an error, we still need the if-then statement checking the status. This only served to move the execution of the step into the ProcessStep method, without providing any reduction in the code duplication. To correct this, we can send in a params array of Func<Step> and loop through them, executing and verifying the result, one step at a time.

   1: Result ProcessSteps(param func<Result>[] steps)

   2: {

   3:     Result result = null;

   4:     foreach(step in steps)

   5:     {

   6:         result = step();

   7:         if (result.Status == ResultStatus.Error)

   8:             return result;

   9:     }    

  10:     return result;

  11: }

In this code, we loop through the steps and execute each of them. If any of them encounters an error, we immediately return the result. If none of the steps that we executed errors, we return the final result received from the final step. With this method in place, we can now compose our original method with the guard clauses and the individual steps, like this:

   1: public Result DoStuff(Something something, Whatever whatever, WhoCares whocares, DoesntMatter doesntMatter)

   2: {

   3:     return ProcessSteps(

   4:         () => GuardClause1(something),

   5:         () => GuardClause2(whatever),

   6:         () => Step1(whoCares),

   7:         () => Step2(doesntMatter)

   8:     );

   9: }

 

Too Many Return Statements

I’m not sure that I’ve ever talked about this particular standard. I strongly believe in limiting the number of exit points (return statements or otherwise) that a method can have. The closer you can get to 1 exit point, the better. The reason for this is to ensure that the code is easier to work with and easier to understand. Having a series of return statements like the original example shows creates a number of potential problems – well, ok… this example is actually pretty easy to understand since the code is so clean. However, I have had to work with methods that container multiple nested if-then statements and loops, with many different exit points in the mix. It becomes very difficult to work with and to know exactly what will be returned, when. On top of that, it’s a very procedural way of working.

At this point the code in the Reducing Duplication section is fairly well organized with only 2 exit points. I would not be opposed to this code if it comes down to the choice of readable and understandable code vs. a single exit point. To reduce the exit points in this example, we can take advantage of the break keyword. This allows us to exit the loop that we are in, immediately. The final version of the ProcessSteps method looks like this:

   1: Result ProcessSteps(param func<Result>[] steps)

   2: {

   3:     Result result = null;

   4:     foreach(step in steps)

   5:     {

   6:         result = step();

   7:         if (result.Status == ResultStatus.Error)

   8:             break;

   9:     }    

  10:     return result;

  11: }

  12:  

 

Readability and Understandability

I said that my original reaction to the code included concerns about being able to read and understand the code. It was difficult to see everything that was going on and it was hard to understand why the null checks were in place without talking to the developer that wrote the code. Throughout the process of refactoring this code, though, we continuously made steps to improve the readability and understandability. The end result created a code base that lets use easily extend the functionality by adding additional steps in the DoStuff method, while leaving the process of executing and verifying the results to the ProcessSteps method. Now if we need to change either of those concerns, we can do so without having the repeat the change and without affecting the other concern. We can compose the process with our individual steps and leave the infrastructure of executing them as an implementation detail that can change independently (to a certain extent, of course).

 

One Example, One Lesson…

Remember that this only one example of one specific scenario where these specific refactorings are the right ones to use. Every situation must be evaluated for it’s own needs. There’s not doubt that the usefulness of these specific refactorings will vary with every situation that you run into. However, understanding how to move code from a procedural, difficult to understand and work into a state where it is easy to understand and easy to see what is happening, is very valuable. Hopefully you will be able to take a few points away from this example and be able to apply some new ideas to your specific contexts.

If you only learn one thing from this post, though, it should be that refactoring code to a clean structure is a step by step process. If I had shown the unit tests around the DoStuff method, then I would have been able to prove that the functionality of this method did not deteriorate at any point during our changes. We methodically changed one element of the internal code structure, one after another, until we ended with a solution that still passed all tests and provided a much better code base to work with.


Post Footer automatically generated by Add Post Footer Plugin for wordpress.

About Derick Bailey

Derick Bailey is an entrepreneur, problem solver (and creator? :P ), software developer, screecaster, writer, blogger, speaker and technology leader in central Texas (north of Austin). He runs SignalLeaf.com - the amazingly awesome podcast audio hosting service that everyone should be using, and WatchMeCode.net where he throws down the JavaScript gauntlets to get you up to speed. He has been a professional software developer since the late 90's, and has been writing code since the late 80's. Find me on twitter: @derickbailey, @mutedsolutions, @backbonejsclass Find me on the web: SignalLeaf, WatchMeCode, Kendo UI blog, MarionetteJS, My Github profile, On Google+.
This entry was posted in .NET, C#, Principles and Patterns, Refactoring. Bookmark the permalink. Follow any comments here with the RSS feed for this post.
  • Michael Chandler

    Did you try using the maybe monad?

  • http://www.lostechies.com/members/derick.bailey/default.aspx derick.bailey

    michael – i haven’t even heard of that, let alone tried it. :) is there a Maybe built into .NET 4? we’re still on .NET 3.5, but there seem to be plenty of articles on building a Maybe implementation.

    from what i’ve ready in the last 10 minutes, it looks like Maybe is used often as a null-object pattern and/or a way to ensure that you never have null references in your code. I can certainly see the advantage of that, but in this case I don’t think a null object pattern would model the result the way i want to see it.

    the core of the logic here, is “if the step fails, return the result immediately, without processing any additional steps”. by using a Maybe we would have to change our if statement in ProcessSteps to “if (result.HasValue)” which does not tell me anything other than the object having a value or not. By modeling the result with a status and checking to see if that status equals a well known Error type (an enum in this case), I know what the value means instead of only knowing whether or not there is a value and then having to implicitly understand what the presence of a value indicates.

    … or did i completely misunderstand your suggestion? Feel free to share some code on how you would recommend using this (post the code on Pastie.org or another site like that, and link to it here in the comments). I’d love to understand what you’re trying to say and see when / where this would be useful.

  • http://www.lostechies.com/members/derick.bailey/default.aspx derick.bailey

    hmmm…. ok – i’m pretty sure i did completely miss what you were trying to suggest. i’m going to have to read up on monads in c# and see if this would be beneficial here… cause i think it might be fun, at least. :)

  • http://simpleprogrammer.com John Sonmez

    Excellent post!

    As soon as I saw the first part of the code, I was pretty sure you were going to go the Func<> way!

    The only thing I disagree with is the one return refactor. I am fan of early return. Although, I do agree with you that many returns is a problem.

    Nice refactor though, I see this kind of code many times and it is not immediately obvious how to refactor it.

  • http://www.lostechies.com/members/derick.bailey/default.aspx derick.bailey

    @john – thanks! :)

    i was 50/50 split on whether or not to do the one return statement change. in this simple case, it’s not really a big deal either way. i decided to go with it just to make the point, but i wouldn’t care either way if the code were really this small.

    i agree on early returns, when they are from guard clauses. otherwise, i try to keep the code small enough that an early return is not really needed. :)

    @michael – i’ve been reading more about monads and Maybe and i’m intrigued…

    i can see some potential value here, though i’m not sure an “Empty” value is what i’m looking for. i think the idea of a monad for piping these calls together is a potentially good idea, though. i’m going to have to continue doing some reading and experimenting to really figure out this monad stuff in c#, and see if i can come up with a monad version of the solution.

    thanks for the suggestion! you’ve given me something to obsess over and something new to learn, for a while :)

  • David Clarke

    To me a guard clause is the Release version equivalent of an Assert. A guard clause says if one of my inputs is not valid then I cannot continue and should either return an error or throw an exception. Maybe I’m misinterpreting your example code but to me your method lacks cohesion and instead of refactoring to have actual guard clauses and higher cohesion with (apart from the guard clause returns) a single return you have instead used artifice via a func to give the appearance of a single return. Just saying.

  • http://murrayon.net/ Mike Murray

    My first thought when looking at the code was also to perhaps utilize the Maybe Monad. I’ve been looking into that recently, and blogged about it: http://murrayon.net/2010/09/maybe-from-murray-monads.html

    I will have a follow up blog post by the end of the week with a link to a different Monad library that I put together instead of the one I introduced in the above linked to blog post.

  • http://www.lostechies.com/members/derick.bailey/default.aspx derick.bailey

    @Michael and @Mike Murray

    I’ve been reading up on monads and have put together some examples to try and learn about them, specifically with Maybe – your article was VERY helpful, Mike Murray…thank you for the link!

    i can certainly see how the Maybe monad would be an option for the original code… however, I don’t think it would be the best option. i think a monad, in general, though, could create a potentially more elegant solution – just not the Maybe monad.

    Assuming my understanding of monads is correct (see my monads post that i just put up), I’ll be posting what I think is a better monadic solution that still accounts for everything I want from explicit modelling, etc, in this post. :)

  • http://www.augi.cz Augi

    The first code example can be easily cleaned using null-coalescing operator:

    return GuardClause1(something) ?? GuardClause2(whatever) ?? Step1(whoCares) ?? Step2(doesntMatter);

    The solution you proposed in “Reducing Duplication” paragraph is actually well known as Template Method pattern: http://en.wikipedia.org/wiki/Template_method_pattern