Quality and code coverage


It’s an age-old question: should our team’s goal be 100% coverage?  A valid question, but one I’ve never much cared about in practice.  The idea is that the team, all practicing TDD, should dutifully measure and add unit tests until they reach the assumed pinnacle of unit testing: 100% coverage.

The general motivation behind 100% coverage is that 100% coverage equals zero bugs.  But what exactly is 100% coverage?  Coverage of what?  Wikipedia lists several kinds of coverage measures:

  • Function coverage
  • Statement coverage
  • Condition (Branch) coverage
  • Path coverage
  • Entry/exit coverage

One of the most popular code coverage tools in .NET is NCover, which supports:

  • Function coverage (implicitly)
  • Statement coverage
  • Branch coverage (Enterprise edition only)

NCover is a powerful tool, but it still doesn’t support all types of coverage.  Attaining 100% coverage in NCover still means there are paths that we haven’t tested yet, which means there are still potential bugs in our code.  If 100% coverage is a goal, stopping with NCover’s measurement would lead into a false sense of security, or that an assumption that your codebase is bug-free.  It isn’t.

Not only is it not bug-free, but code coverage says nothing of defects.  Defects occur when the users of the software tell you it’s not working the way they would like, and the root cause is a gap in the story analysis, missing acceptance criteria, or even a new story altogether.  100% coverage doesn’t mean we’re done, not by a long shot.

###

###

Only the interesting parts

In recent projects where we measured coverage several months in to the project, we saw regular numbers of 90% coverage.  This was on a team doing 100% TDD.  So what happened to the extra 10%?  If we’re doing TDD all the time, why isn’t every statement covered?  Every change was introduced with TDD, yet we still had gaps.

So are we doing TDD wrong?  Looking at our tests, it certainly doesn’t seem so.  Every test introduced covered behavior we considered interesting.  If behavior isn’t interesting, we don’t care about it.  Things like parameter checks, properties, law of demeter violations, and other brain-dead code is not covered.  Why?  The behavior just isn’t that interesting.  If tests are a description of the behavior of the system, why fill it with all the boring, trivial parts?  The effort required to cover triviality is just too high compared to other ways we can increase value.

###

Diminishing returns

Missing in the 100% coverage conversation is the effort required to get to 100%.  Attempting to get another 5% takes equal effort of the previous 10%.  The next 2% takes equal effort of the previous 15%, and so on.  The closer we try to get to 100%, the more difficult it is to achieve.  This is called the law of diminishing returns.  As we get closer and closer to 100%, it takes vastly more effort to get there.  At some point, you have to ask yourselves, is there value in this effort?  Often, bending code to get 100% can decrease design quality, as you’re now twisting the original intent solely for coverage concerns, not usability, readability or other concerns.

Which is why when the question of 100% coverage comes up, I’m very skeptical of it as a goal or bar to set.  Measuring coverage is an interesting data point, as are other measures such as static analysis.  But in the end, it’s only a measure, an indication.  It’s still up to the team to decide on the value of addressing missing areas, with the full knowledge that they are still limited to what the tool measures.

Personally, I’d much rather spend the effort elsewhere, where I’ll get a much better return on my effort spent.

On passion