Getting value out of your unit tests

19 December, 2008. It was a Friday.

Unit tests, with TDD in particular, are the most efficient way I’ve found in creating behavior for my application. For lasting value, beyond just the safety net of “if I change something, will something break”, requires extra discipline, and a more refined manner in which we go about doing TDD. One of the most frustrating aspects of TDD done wrong is dozens of unit tests that provide no other value than they might fail.

Back when Scott and others were refining TDD into Context/Specification BDD-style, I remember him posing the question, “if we actually treat tests as documentation, what is the result?”. For a lot of unit tests, that documentation is a muddled, miserable mess. Nothing can be more frustrating than a failing unit test that provides zero insight into why the test exists in the first place.

My first year or so of doing TDD produced exactly that, a mess. No insight into the what or why of my system’s behavior, but merely a retrospective look on what each individual class did. But by adding a few rules, as well as personal guidelines, I’ve noticed my tests have started to provide what I really wanted – a description of the behavior of the system. These rules injected that value in my tests that would have otherwise made the tests just dead weight, code I avoided after I wrote them.

Rule #1 – Test names should describe the what and the why, from the user’s perspective

One way to do this easily is the Context/Observation(specification) naming style. I’m not the greatest at explaining exactly the BDD style, I’d rather defer to JP Boodhoo, Scott Bellware’s article, Aaron Jensen and Raymond Lewallen on this one. But the general idea is that an outside developer should be able to read the test/class name, and clearly understand what the intended observable behavior is, for a given context. Here’s a bad, bad example:

[Test]
public void ValidationShouldWorkCorrectly()

Validation should work correctly. Hmm. “Correctly” is in the eye of the beholder. “Correctly” depends on the circumstances. “Correctly” depends on your definition of “correct”. The kicker is, if this test fails in the future, how will I know if it is because the test is wrong or the code is wrong? I won’t. And at that point, I have to make a judgment call on whether the test is holding its water or not. Sometimes, I’ll just make the test pass, sometimes I’ll remove the test, and sometimes I’ll spend a lot of wasted time trying to figure why the heck the test was written in the first place.

That’s perhaps the most frustrating aspect. A unit test that was supposed to provide value when it failed, instead only caused confusion and consternation.

Rule #2 – Tests are code too, give them some love

Whether you think so or not, you will have to maintain your tests. You don’t have to model your tests like you would, but duplication matters in your tests. Do yourself a favor, right now, go order the xUnit Test Patterns book. TDD is a craft, and the xUnit book is a How-To. Veteran or neophyte, you’ll learn a new angle that you didn’t know before. One thing it showed me is that tests are code, which I have to maintain, and showing them love pays almost just as well as it would refactoring production code.

Refactoring tests does many things for you:

Eliminates test smells such as brittleness, obscurity or erraticism, and more
Provide insight into common contexts and behavioral connections (if several tests have the same setup, then their assertions/observations are related somehow)
Reduces nausea

I hate, hate long, complex tests. If a test has 30 lines of setup, please put that behind a creation method. A long test just irritates and leaves the developer cross-eyed. If I don’t have long methods in production code, why would I allow this in our test code? A test has three parts: Setup, Execute, Verify (and sometimes Teardown). Following the Context/Specification test pattern already groups these three parts for you, so you can’t get yourself in trouble.

Bottom line, tests are code, love them, and they’ll love you back.

Rule #3 – Don’t settle on one fixture pattern/organizational style

This assumption killed me for a quite a long time. For some reason or another, I always assumed that I should have a test fixture pattern of One Class, One Fixture. One problem of course, this makes understanding behavior of the system as a whole, and of its parts, quite difficult. Systems with hundreds of classes aren’t very easy to see how the pieces fit, and finding usages and following control flow just doesn’t cut it. But aren’t our tests supposed to be documentation? Aren’t they supposed to describe how the system works?

If you’re blindly following one fixture per class, that ain’t happening.

One fixture per class leads to some pretty crappy test fixtures. I’ve seen (and created) test fixtures literally thousands of lines long. That’s right, thousands. We would never, ever, ever tolerate that in production code, why is our test code exempt? Simply because we assumed that there is one pattern to rule them all. Well, I don’t construct every object with a no-arg constructor, why should I tie both hands behind my back with One fixture pattern?

Or better yet, why just one organizational pattern? Do unit tests have to match the existing code base file for file, class for class, just with “UnitTests” somewhere in the namespace? That creates absolute insanity.

Sometimes classes have a lot of behavior that belongs in one place. Big fixtures can indicate I need to refactor…unless it’s my fixtures that are the problem. Repositories that provide complex searching are going to have a lot of tests. But do yourself a favor, look at alternative grouping, such as *gasp* common contexts. The really cool thing about exploring alternate organization is that it fits perfectly with Rule #2. Explore other organizational styles and fixture patterns. Try organizing by user experience contexts, not strictly by classes.

When you start seeing behavior in your system not through your classes’ eyes, but from the expected user experience, you’re on the road to truly valuable, descriptive tests.

Rule #4 – One Setup, Execute and Verify per Test

Another side-effect of blindly using one fixture per class are tests that either:

Have a lot of asserts
Have a lot of Setup/Execute/Verify in one test

Tests should have one reason to fail. Asserting twenty different pieces isn’t one reason to fail. If we’re following Rule #1, that test name is going to get very, very long if we try and describe all of the observations we’re doing. Have a lot of asserts? Pick a different fixture pattern. Test fixture per feature and test fixture per fixture are great for breaking out separate assertions into individual tests. The better I can describe the observations (from the user experience side), the more the code being created will match what is actually needed.

The multiple execute/verify in a single test is also indicative of assuming one fixture per class. Here’s one example:

[Test]
public void ValidationShouldWorkCorrectly()
{
    var user = new User();

    user.Username = "   ";
    user.IsValid().ShouldBeFalse();

    user.Username = "34df";
    user.IsValid().ShouldBeFalse();

    user.Username = "oijwoiejf^&*";
    user.IsValid().ShouldBeFalse();
}

Blech. Not only does the test name suck (which with this many asserts, did we expect any different?), but I have zero insight into the different contexts and valid observations going on here. What if the second assertion fails a month from now. How exactly am I to know if the test is wrong? Or the code is wrong? Another headache.

Whatever test fixture pattern you go with, you have to stick with one Setup Execute and Verify per test. If your current fixture pattern doesn’t allow you to adhere to this rule…change patterns.

Keepin’ it clean

Tests can be documentation, but only if you try. If you’re in the “write and forget” category, your tests will become a deadweight, some even causing negative value. Thousand-line test fixtures, hundred-line tests, incoherent and illegible tests, bad test names, all of these both contribute waste and detract from the maintainability of your system. So what, a test failed. Congratulations. But can you understand why it failed, what behavior is being specified, and why it was important under what circumstances? If not, what value are your tests giving you, except for a future headache?

← Why we need named parameters

Piecemeal Expression evaluation →