Red For The Right Reason: Fail By Assertion, Not By Anything Else

Thomas Weller commented on my Red/Green/Refactor For The Right Reasons post and asked me to explain why I don’t think a throwing a NotImplementedException is a good reason for a test to be red. It’s a good question, one that I questioned for a long time, and something that is worth talking about on it’s own rather than just replying in the original comment stream. To get things started, here is Thomas’ comment:

Hi Derick,

"A “Throw NotImplementedException();” is never the right reason" for a test to be red !? I’m not sure I’m buying that. I don’t exactly understand the problem with that approach.

Having a test being red because of a NotImplementedException tells me that my test set up is right and the test fails only because the method under test is not yet implemented – and that’s exactly what I want. The effect is the same that you get when hard-coding a return value, only that you don’t depend on a specific value and you get the code for free with R#. -  Or, in your words: If the test fails because of a NotImplementedException, it is red for the right reasons (at least in my eyes…).

I use this approach regularly in TDD and I made very good experiences with it – especially because with a NotImplementedException a test will NEVER become green for the wrong reasons, and therefore there will be no danger that it is overlooked in the (sometimes hectic) production process.

This is no religion, of course. But I don’t really get your arguments at this point (while I’m totally d’accord with the rest of your post). Maybe you could clarify a bit more on that…

Regards

Thomas

 

The Underlying Principle: Fail By Assertion, Not By Anything Else

The underlying principle behind why a NotImplementedException is never the right reason to fail is that it does not let an assertion fail the test. A test must be able to fail because an assertion caused it to fail, and for no other reason. We must prove that the test fails because the assertion did not find the expected values or interactions, or we do not know that the test can fail for the right reasons. Any other reason for failure is an abnormal condition that tells us we have something wrong with our code.

 

Test For Your Needs, Not The Framework’s Functionality

Throwing a NotImplementedException as the red part of your red/green/refactor is doing nothing more than proving that the .NET framework can throw exceptions and that your testing framework can fail a test when an exception is thrown.

   1: [Test]

   2: public void Test_Foo()

   3: {

   4:   var result = Foo();

   5:   Assert.AreEqual(60, result);

   6: }

   7:  

   8: public int Foo()

   9: {

  10:   throw new NotImplementedException();

  11: }

This test doesn’t fail for the right reason because it does let the assertion fail the test. It does not show that an abnormal condition exists between our expectations and the implementation other than to say that the method has not yet been implemented. It is only proving that an exception will fail the test, which is not something we need test. The unit tests that the testing framework was written with will prove that a test can fail when an exception is caught.

 

A Simple State Based Test

In this test, we are asking an object to do some work which results in it’s own state being modified.

   1: [Test]

   2: public void A_State_Test()

   3: {

   4:   var myObj = new MyObject();

   5:   myObj.DoSomething();

   6:   Assert.AreEqual(60, myObj.SomeState);

   7: }

If the DoSomething method throws a NotImplementedException, we know that we have not yet implemented it but we do not know that the test can fail because the assertion failed. If we don’t let the assert fail because the value of myObj.SomeState is not what we expected, then we can have no confidence that this test will fail for the right reasons.

 

A Simple Interaction Test

In this test, we are asking an object to do some work and then expecting that it will interact with another object via an interface. The assertion is made against the interaction by checking to see if the correct method was called on the interface.

   1: [Test]

   2: public void An_Interaction_Test()

   3: {

   4:   var anotherObject = MockRepository.CreateMock<IAmThatThing>();

   5:   var myObject = new MyObject(anotherObject);

   6:   myObj.DoSomething();

   7:   anotherObject.AssertWasCalled(ao => ao.ThatThing());

   8: }

If the DoSomething method throws a NotImplementedException here, the method called ThatThing() on the IAmThatThing interface may not be called, but we don’t actually know that it was not called. We are not proving that the method is not called and that the assertion can fail the test. Rather, we are only proving that an exception can fail the test.

 

Complex Test Scenarios

f the object being tested has a private method and let put a NotImplementedException into that private method, we have not proven that the method being called is not yet implemented. In fact, we have no certainty of which method in the object is not implemented until we look at the stack trace that is produced by the exception being thrown. If, on the other hand, the private method is expected to return a specific value under the specific circumstances that were set up by the public method being called from the test then we can return a value that is known to be incorrect and the assertion will fail for the right reason.

If we are dealing with a more complex set of dependencies and interactions then the NotImplementedException becomes more and more of an obstruction to the tests for every assertion that we have. Having an exception will fail the test when we have 5 asserts against interactions means that we have 5 tests that are failing for an invalid reason.

 

False Positives, False Negatives

A false positive exists when a test passes for the wrong reasons, and a false negative exists when a test fails for the wrong reasons. Thomas said this about false positives in his comment:

because with a NotImplementedException a test will NEVER become green for the wrong reasons, and therefore there will be no danger that it is overlooked in the (sometimes hectic) production process.

He is right about that, no doubt. A NotImplementedException certainly will prevent false positives while it is in place, but there are two problems that I see with this statement:

  1. You are introducing false negatives, which are just as dangerous as false positives. Since the test is failing because of an exception and not because an assertion found an abnormality, the test failure is a false negative
  2. You have not prevented false positives, only postponed them. Once the NotImplementedException is removed, the code is open to the possibility of false positives

We must be diligent about preventing false positives and false negatives, both, not just false positives.

 

When To Test For Exceptions

There are times when it’s important to do both positive and negative testing around exceptions… when your code throws them. This example shows that we expect a process to not throw any exceptions under one condition, and we expect it to throw an exception under another condition.

   1: [Test]

   2: public void Dont_Throw_Exceptions()

   3: {

   4:   var myObject = new MyObject();

   5:   Exception caughtException = null;

   6:   try

   7:   {

   8:     myObject.DoSomething(withThisCondition);

   9:   }

  10:   catch(Exception ex)

  11:   {

  12:     caughtException = ex;

  13:   }

  14:   Assert.IsNull(caughtException);

  15: }

  16:  

  17: [Test]

  18: public void Throw_Exceptions()

  19: {

  20:   var myObject = new MyObject();

  21:   Exception caughtException = null;

  22:   try

  23:   {

  24:     myObject.DoSomething(basedOnAnotherCondition);

  25:   }

  26:   catch(Exception ex)

  27:   {

  28:     caughtException = ex;

  29:   }

  30:   Assert.IsNotNull(caughtException);

  31: }

These tests are valid because they will fail is the assertion fails (and yes I know about the various attributes that can be put on the tests to expect/not expect an exception. This is an example of the principle, not the implementation detail)

 

When To Use NotImplementedException

I’m not saying that you shouldn’t use not implemented exceptions at all, or that I never use them. I actually do use them on occasion, but for a very specific purpose: to remind myself that I have not yet implemented the method in question. I find it very convenient to do this when I’m about to leave my workstation or change my current train of thought to something else for a while. Having the NotImplementedException in place will be a big red flag to tell me that I was not yet done when I was interrupted and went on to do something else. When I do this, though, I am specifically stating that the method is not yet implemented and that I need to re-engage myself in the context of the code and the tests to figure out where I left off and continue. It’s not a sign that my test fails for the right reason, it’s a sign that I left in the middle of something that needs to be completed.

 

Thanks Thomas!

I’d like to say thanks for questioning this. In the process of of thinking about this and writing it up, I had to dig deep into the principles and beliefs that I hold for test-first development. My first draft of this response contained a bunch of extraneous nonsense that had nothing to do with the real issue, then around 75% of the way through this that I had the “aha!” moment and wrote down the underlying principle of fail by assertion, not anything else. From that revelation, I was able to cut the nonsense and unrelated content and solidify my own understanding of why I don’t think a NotImplementedException is red for the right reason.

Thanks, Thomas. Without your questions and comments I may not have had this little revelation.


Post Footer automatically generated by Add Post Footer Plugin for wordpress.

About Derick Bailey

Derick Bailey is an entrepreneur, problem solver (and creator? :P ), software developer, screecaster, writer, blogger, speaker and technology leader in central Texas (north of Austin). He runs SignalLeaf.com - the amazingly awesome podcast audio hosting service that everyone should be using, and WatchMeCode.net where he throws down the JavaScript gauntlets to get you up to speed. He has been a professional software developer since the late 90's, and has been writing code since the late 80's. Find me on twitter: @derickbailey, @mutedsolutions, @backbonejsclass Find me on the web: SignalLeaf, WatchMeCode, Kendo UI blog, MarionetteJS, My Github profile, On Google+.
This entry was posted in .NET, C#, Principles and Patterns, Test Automation, Unit Testing. Bookmark the permalink. Follow any comments here with the RSS feed for this post.
  • David Fauber

    “If we don’t let the assert fail because the value of myObj.SomeState is not what we expected, then we can have no confidence that this test will fail for the right reasons. ”

    I’m sure I’m missing something, but It seems like any alternate solution would run the risk of introducing a false positive. Good post regardless, definitely food for thought.

  • http://www.lostechies.com/members/derick.bailey/default.aspx derick.bailey

    David – ya, that sounds right. good clarification.

  • http://www.blogcoward.com jdn

    For reminders of unimplemented tests, I tend to use Assert.Inconclusive() here.

  • http://www.thecodingmonkey.net Nick

    What would you do in the circumstance where the method you are testing – DoSomething() – returns a value? You show the example of where it occurs with the NotImplementedException, but not really how to fix the problem.

    If your answer is to change it to use an internal state based answer, then I disagree. Changing an API just to support not throwing an exception seems like a bad code smell to me.

    And in the case of a function that returns a primitive like an int, 0 or -1 might be perfectly legal return values, in which case there is not same “error” value to return.

    In those circumstances, I would still suggest throwing a NotImplementedException.

    In the end, it really boils down to all the purposes that one might use unit tests for. Some people use them partially as a “checklist” of getting everything done. In that case, a red light, whether from a failed assertion or a thrown exception still serve the purpose of alerting the developer to needed work.

  • http://www.lostechies.com/members/derick.bailey/default.aspx derick.bailey

    @Nick,

    if the method returns a primitive value like an int, then have it return a value that is not expected so that the assertion can fail. in the “simple state based test” example, i would have the DoSomething() method return something other than 60… i’d probably have it return 0, personally, but anything other than 60 would cause the assertion to fail.

    throwing a not implemented exception doesn’t let me know that the test can fail for the right reason.

    “Some people use them partially as a “checklist” of getting everything done. In that case, a red light, whether from a failed assertion or a thrown exception still serve the purpose of alerting the developer to needed work.”

    if you’re doing a checklist (which i do a lot), you should create that check list with failing tests, not failing code. stubbing out a method and putting in a not implemented exception so that you can have a checklist is jumping too far ahead and making assumptions about what the design of the code will look like when you get to that point. instead, you should stub the tests, not the implementation. creating empty tests that have an assert.fail or assert.inconclusive… you could even put a not implemented exception in the test that has not yet been implemented… just don’t stub out the class under test with a not implemented exception

  • Jeremy Gray

    @Nick (and others) – I have lost count of the number of times I have had to fail people’s code during a review session because I could go into the code, change it to do something entirely fails the expectations of one or more tests, and still have the test(s) pass. “But the test passed for me and it still passes now!?!?!” they respond, with a quizzical look.

    It can take a while to explain to them what has happened and why it is so important, and a bit longer still for the light bulb to come on, but once they do grok it the quality of their tests, and in turn their code, skyrockets, and here’s why:

    If a test doesn’t first fail because of one of its assertions then you have no proof that it later passes because of its assertions. Such tests are a net detriment to your project because they provide false security.

    Throwing an exception such as a NotImplementedException is a handy reminder but can’t be used as the first “red” because while it causes the test to fail it does so for a reason other than its assertions. When you remove the throw and the test appears to go green you are left with a test that could have gone green for any of a zillion reasons, some number of which will have nothing to do with fulfilling the requirements of the test. Each test must first fail due to one of its assertions failing, not because of some other throw.

    On an important related note, this is actually quite closely related to one of the main dangers of doing TAD instead of TDD. When most people implement tests after development (we all do from time to time and, since they can add value, I’m not suggesting we stop) they write the test, it goes immediately green, and they call it a day. But why did it go green? Are we sure it is because our arrange, act, assert steps really did the right thing? If you want to prove that a TAD test is accurate, the next thing you should do is go into the code and break the one tiny bit of logic that is critical to fulfilling that test, make sure the test fails, and then restore the code to make the test go green again.

    I like to think of it this way:

    TDD: Red, Green, Refactor
    TAD: Green, Red, Green

    As an aside, both the TDD and TAD scenarios above also provide a good reminder why we should strive for one logical assertion per test, in that it becomes more difficult to perform the kinds of verifications I describe when there are multiple logical assertions (effectively multiple tests) going on within a single test.

  • http://www.lostechies.com/members/derick.bailey/default.aspx derick.bailey

    @Jeremy – good comments and adds a lot to the points I was making. thanks!

    All,

    Scott Bellware sent me a bit of info that relates to this subject and pointed me to one of his old blog posts (from 2005). it’s well worth reading.

    —–

    “failing by assertion” is somewhat imprecise and a bit misleading. A test is sensible when the test and the code reflects the intention or expectations of the design, not of the assertion. Calls
    to a test library’s assertions (or other mechanisms) are a side effect of expression of intention. I wrote about this same issue a while ago: http://codebetter.com/blogs/scott.bellware/archive/2005/11/22/134954.aspx

    —–

    i had focused in on the mechanical aspect or implementation detail of that intention and expectation, and I see his point about the underlying intention and expectation being the real reasons for failure. having a test fail by the assertion for the right reasons while not expressing the intent of the code under test is going to cause problems when that test is read later and people are trying to understand the intention of that code. it will hinder change and progress through obscurity in the test itself.

  • Jeremy Gray

    I should have mentioned this explicitly in my comment but it is worth noting that in every one of the instances I mentioned above either the test or both the test and the system under test were broken. In other words, in every case at least the test was broken. As such, when you mention “for the right reasons” you are dead on.

    The extra steps folks like you and I are suggesting aren’t going to provide 100% certainty but they can definitely help reduce the frequency of things failing (or passing) for the wrong reasons, and that has proven valuable enough to me to make it worthwhile.

  • Dathan Bennett

    Having the test fail only by assertion serves the purpose of feedback testing the test itself. “Does the test work? Let’s feed it some bogus data and verify it fails. Now feed it the right data and verify that it passes.” I completely agree that tests need to be validated in such a fashion (as you yourself said, I’m not smart enough to get it right the first time – I have the test to prove that i wrote the code correctly, but I need to test the test before I believe it’s right as well).

    However, I don’t believe that it’s always evil to have a test fail because of a NotImplementedException or some such. So long as, before you implement the method, you verify that the test fails for the right reason. I wonder why anyone would do this (it’s even easier to, say, “return null” than “throw new NotImplementedException()”), but it’s no great evil, as long as you have the discipline to verify your test’s assertion behavior (“fails for the right reasons”) before proceeding blithely along making the test pass.

    If you’re following TDD rigorously, though, I see “throw new NotImplementedException()” as a red flag. It’s easy to spot with the eye, which is how I used it before starting TDD (“Work here next!”). Doing TDD, though, you should never need it – you have the failing test to tell you where to work next. I would look at that exception throw and think that the developer who put it there isn’t quite mentally in the groove of TDD, and is putting flags in the code to know where to work next rather than depending on the tests. (Of course, with R# and the VS “Implement Interface” macros, it may just be an indication that the developer is using the provided tooling. Which is, of course, fine.)

  • Andrew Wilson

    Jeremy’s comments above made something click for me regarding this article (keeping in mind I’m new to TDD).
    TAD or any automated testing has a green-red-green pattern to it where the functionality works and the red phase is there to test the test. Makes sense.
    The bit that is not entirely obvious to a newbie is that the red in TDD’s red-green-refactor needs to do the same thing. Because the tests are being used to write the functionality it is a given that the test will be red due to the fact that it is unimplemented.
    It might be more accurate to say TDD is more like red-implement-red-green-refactor.

  • http://www.thomas-weller.de Thomas Weller

    Hi Derick.

    First, sorry for my late response. I’m VERY busy at the moment (that blogging/commenting thing doesn’t pay the bills, right…).

    Second, thank you for your exhaustive reply. Obviously, you really invested some deep thougths in my question, and your reply definitely is worth reading and valuable food for thought – I really appreciate that. Like you, I love these kinds of discussions, where you are asked to clarify your thoughts and to sharpen your arguments…

    Concerning the issue itself, I 100% agree with your arguments. But when I thought it over, I found that I don’t really care about the correctness of a test outcome during the ‘red’ phase of the red-green-refactor cycle. At this stage, my main concern is that the test fails at the intended location (so I really do read the stack trace to find it out). But later on, when everything is in place, a test must only be red because of a missed assertion and here everything your post states applies. So one could say that ‘the right reason for red’ is different for me depending on the working phase it occurs: Am I _in_ the cycle, or is the test already in place (and executed as part of the system’s regular, repeated, and automated integration and verification process).

    This inspired me to write a blog post myself, which demonstrates my way of doing TDD in some detail. It started as a response to your post, but it slightly mutated to something different and now is more loosely related to your post, I’m afraid (while still relating to it…). I’m currently working at it, not sure if I can finish it this weekend…

    Regards
    Thomas

  • http://MurrayOn.NET/ Mike Murray

    It appears this post was a “work it through out loud until you come to the right AHA moment.” That would likely explain why it seemed too verbose and overly thorough in explaining what could probably be summed up in a fairly concise response.

    Derick did finally hit the nail on the head…it’s about false negatives. The red part of Red/Green/Refactor is to ensure expected failure. Red is for failing the assert for the right reason, or in other words the code not correctly or fully implemented (notice I didn’t say not implemented). Green is for then making a code implementation change that passes that same assert. The Red and Green steps are very focused on validating expectations and that your implementation is inline with those expectations.

    There is nothing wrong with NotImplementedException, but it can’t be your only Red assert. The true Red assert needs to come first before you move onto Green, and this is because (as Derick explained) the NotImplementedException doesn’t allow your assert to run (during runtime that codepath is short-circuted with the throw). Also having the NotImplementedException may have discouraged an assert in the test at all because you thought you already got your Red step.

    I guess I saw what Derick was saying, but felt it could be summarized in a different way and perhaps some other reader might benefit from an alternate explanation.

  • http://www.lostechies.com/members/derick.bailey/default.aspx derick.bailey

    @Mike Murray – ya, your observations of how i wrote this post are spot on. i think the article i linked to in the comments, from scott bellware, does a much better job of stating the things i was trying to say.

  • Someonepostedthat

    Not sure it can add to the discution, because quite late, but usually, there is a better way than the throw to remind a not implemented method.

    #warning

    It will put a warning on your compiler, so if you aim for a clean code, you will never miss them on compilation.