Mocks, Stubs and Unreadable Tests: Clearly I’m Doing This Wrong

I tweeted this a few minutes ago:

Screen shot 2011-01-13 at 9.18.38 PM.png

This is in reference to a horrible test that I wrote today. It’s got 2 assertions and more than 20 lines of context to set up the mocks that I needed, to isolate things and prove what I wanted to prove.

As a reference point, here’s the implementation the class under test:

In the same thread of thinking that Jimmy has been posting about recently (which I’m in 100% agreement with), I want to get away from writing ugly tests like this. I’m clearly breaking all the principles that Jimmy is talking about – I’m exposing a lot of detail of the implementation of the class under test, I’m creating a very brittle test that will break with any small change to the class under test, and I’m making it very painful for me to change this part of the system because the tests are getting in my way.

Here’s the problem I have with a lot of the talk concerning bad tests… it’s wonderful talk, but it’s nothing more. I’m reading Jimmy’s posts and I’m thinking that he’s on to something, but I’m always left with a giant question at the end: how do I _not_ do what you’re talking about?

What are the alternatives to this? How can I improve my test, in this scenario? Am I missing something simple like an object mother or a context class? Should I be letting the actual classes run in this case, which would require a fairly significant amount of setup data including a number of .haml template files that need to be read / processed?

I’m looking for real-world examples of how to solve real-world testing problems. It doesn’t really matter what language the tests are written in: Ruby, C#, Java, Python… I’ll even take C++ if your solution is understandable and illustrates the principles of good test design.

Post links to your solutions to this type of problem in the comments.

About Derick Bailey

Derick Bailey is an entrepreneur, problem solver (and creator? :P ), software developer, screecaster, writer, blogger, speaker and technology leader in central Texas (north of Austin). He runs - the amazingly awesome podcast audio hosting service that everyone should be using, and where he throws down the JavaScript gauntlets to get you up to speed. He has been a professional software developer since the late 90's, and has been writing code since the late 80's. Find me on twitter: @derickbailey, @mutedsolutions, @backbonejsclass Find me on the web: SignalLeaf, WatchMeCode, Kendo UI blog, MarionetteJS, My Github profile, On Google+.
This entry was posted in AntiPatterns, Principles and Patterns, Quality, RSpec, Test Automation, Testing, Unit Testing. Bookmark the permalink. Follow any comments here with the RSS feed for this post.
  • Lee Brandt

    I tend to use ugly mock set up as an indicator that my design may be flawed. I am not saying your design here is flawed, but that ugly set up with implementation details leaking through might be a smell.
    That said, I have had this problem before. Need to go read Jimmy’s post. I starred it and haven’t gotten back to it yet.

  • Steve Merrick

    I haven’t got an answer, but thank you for posing the question! I look forward to reading other people’s proposed solutions. There have to be some, yes?

  • John – good post, and I am more and more agreeing with what you are saying.

    Unfortunately, you’ve only said the same things that Jimmy has said in his recent posts without really providing any real world examples. My question is not “what is the principle?”, but “what is the solution, the implementation?”

    I need concrete examples that the world can see and touch and learn from, as examples of how it’s supposed to be done.

    While I am spending more and more time doing exactly what you’re talking about – a very large percentage of the tests that I write in my current app are done w/ Cucumber and are interacting directly with the high level system – when it comes down to isolating complexities and reducing the amount of noise in unit tests like I’ve shown above, I am still not seeing any answers from anyone… just more rhetoric about “it’s a sign of a bad design”, or “write higher level tests”, or “don’t do it that way”.

    I’m still waiting for someone to clearly state what we should be doing to correct unit tests like I’ve shown, by showing a proper suite of unit tests.

    “SHOW ME THE MONEY!” – Jerry Maguire.

  • Roco

    Some of your rhetoric flies in the face of what many very intelligent software developers have been teaching us for years. I’m not saying you’re wrong, but you better have something solid to back up what you are saying, consider the incredible amount of literature and code around the supports the contrary.

    Some of your comments around not needed the “TDD training” wheels seems to completely miss the point of TDD. But, I’ll wait for you to provide some examples. ;)

  • Derek,

    I think mostly the guidance you are seeking would look something like this: “Use judgment”

    When you figure out how to teach folks to have better judgment, let me know — I’ll sign a few people up for your course!

    I take most of the recent talk about this topic as a bit of a “Wait a minute” against somewhat dogmatic TDD allegiance. “Test one class at a time” “Inject all dependencies” stuff like that, really those are guidelines. Most of the time, doing work within certain bounds, that stuff works out great. But enough of the time it doesn’t. Those are times when, I believe, the true quality of your developers will show because they cannot follow “the standard way” and have to make their own judgment.

    I think that’s a really long way to say “I don’t think you’re going to be shown any money…” :)

  • [I just posted this comment on John Sonmez's blog, and feel it's worth reposting here.]

    I feel it’s a mistake to treat this [unit tests or blackbox tests] as an either-or choice. Both integration tests and unit tests have their place. (Replace “integration” with “acceptance” or “specification” if you prefer, or say they all have their place.)

    Integration tests show that the system works. This has a lot of value. They’re slow, though, so you won’t run them very often. Each test covers a lot of modules, so failures may require debugging.

    Unit tests assist with design and “dogfooding,” for those of us who are not at the Ri (master) stage. They test portions of the system in isolation and run quickly. Developers can run them after every change to see what broke (temporal granularity). Each test covers a small portion of the codebase, preferably with minimal overlap, so that a test failure pinpoints what broke (spatial granularity). Well-written unit tests improve design and maintainability.

  • Josh

    I’ve been reading about unit testing for the last year and trying to get my head around it all, but haven’t really practiced it (pushing to make it happen at work), so forgive me if this is somewhat naive, as I haven’t personally experienced your pain. I’ve been following what Jimmy has been talking about as it’s of interest to someone like me learning.

    If you take the BDD approach, the tests should be about designing the behaviour of the system. To this extent it’s not about implementation. If you then go and make changes in your code that cause a test to break, well, isn’t that one of the reasons for doing T/BDD? You have now changed your systems behaviour.

    If this happened due to a refactoring, then you have performed it incorrectly or done the wrong refactoring, as a refactoring should not change the behaviour of the code.

    If this happened due to a design change, then you now disobeying the T/BDD rule of writing the test first. If the design needs to change, shouldn’t you go back to the test and redesign the test first, see it fail, then redesign the code?

    As for having to create a lot of mocks, the issue I think is more with the difficulty of getting the class under test. As mentioned earlier, this could be a design smell. Does your class have too many dependencies? Is it just a skeleton with little of it’s own behaviour which delegates to others? Could that behaviour be moved to one of the delegates? etc etc etc

    I’m not quite sure what the real issue is here. You’re making some mocks (3, not a lot, a few lines of code. Write it, move on), and making assertions on them about the behaviour of the subject under test. If you break this you’re breaking the designed behaviour. You can change other implementation details without breaking the test, so long as the object’s behaviour doesn’t change. It’s not brittle, it’s what BDD is about.

  • finally some examples. I’m also agreeing it’s about time people started showing the money. Hi John and Jimmy ;P

  • Derick,

    Thanks for posting a concrete example. It’s really hard to discuss this issue in generalizations.

    I think these tests are difficult to write because your underlying class appears to violate SRP.

    As far as I can tell, it has two responsibilities: 1) determining which answers pertain to the flu and 2) rendering a report for a collection of answers.

    If you were to break apart these two class, perhaps using a module for responsibility (1), then you would be able to swap out that behavior under test in a much cleaner way.

    Great post!

  • One additional thing:

    Unless you are omitting some code, despite all of that setup, you aren’t testing that asthma, diabetes, etc. are the questions you are interested in.

    In my opinion, the easiest way to remedy that is to follow SRP and have a new suite of tests for the component that handles responsibility (1).

  • I’ve been reflecting on the conversation, but I haven’t found the arguments against the use of test doubles compelling thus far. While I appreciate the desire to minimize coupling between specifications and implementation, the advocacy of avoiding test doubles except at system boundaries seems too high a cost to me.

    That said, I don’t feel the example you’ve provided is a great candidate for test doubles aside from the file system interaction. In your particular example, while I’m not that familiar with Ruby and only have the provided information to go on concerning the domain, it seems you have some unnecessary stubs in this case. Test doubles are useful for isolating the SUT, but it doesn’t look like the types you are stubbing out have much behavior to be isolated from. Your stubs seem to consist of a collection and a few value objects with non-volatile behavior. You also have some setup duplication, and not the good kind.

    If I were to implement this in C#, my implementation might look something like the following:

    [Subject(typeof (FluSectionResultGenerator))]
    public class when_applicant_provides_information_indicating_flu_susceptibility : given_a_generator_context
    Because of = () => Generator.Run(new[] {new Question(“Do you have issues?”) {Answer = true},});

    It should_reflect_susceptibility_to_the_flu = () => FileSystemMock.Verify(x => x.Read( “a/b/c/susceptable.txt”));

    [Subject(typeof (FluSectionResultGenerator))]
    public class when_applicant_provides_no_information_indicating_flu_susceptibility : given_a_generator_context
    Because of = () => Generator.Run(new[] {new Question(“Do you have issues?”) {Answer = false},});

    It should_not_reflect_susceptibility_to_the_flu = () => FileSystemMock.Verify(x => x.Read(“a/b/c/non_susceptable.txt”));

    public abstract class given_a_generator_context
    protected static Mock FileSystemMock;
    protected static FluSectionResultGenerator Generator;

    Establish context = () =>
    FileSystemMock = new Mock();
    Generator = new FluSectionResultGenerator(FileSystemMock.Object);

    (In case there’s a formatting issue, see:

    I’ve taken some liberties with the problem domain interpretation, but the first difference of note is that I’ve used a real array, a real Question, and a real Answer in the execution of my SUT. I did this not because I don’t believe in substituting my dependencies, but because Question and Answer are values objects with no real behavior of note and I don’t consider System.Array to be a volatile construct. Perhaps there is more to be concerned with in a dynamic language, but how that might influence the decision to use test doubles or real objects is a separate issue. The second difference is that I’ve introduced a reusable base context for creating the generator. If I had needed to configure the behavior of my test doubles then I would have kept that in the derived context even if it meant duplication (though I might use an intent-revealing helper if the setup were complex). I did duplicate the dummy question with unused text, but I did so because it aids in understanding the context of the test. Perhaps of less interest, my spec names are less implementation-specific (though perhaps “scores” is actually part of your ubiquitous language).

    I think a better candidate for this discussion would be the specification of some component that relied upon dependencies with substantial behavior of their own, only one or two of which are pertinent to the specification. From here we could consider what each approach looks like, how each approach might affect true outside-in design, and what the consequences of introducing breaking changes into each of the dependencies might be. Your Score and Answer types may indeed have some behavior not reflected in your example, but I couldn’t divine any.

    On this subject, I’d recommend reading the book: Growing Object Oriented Software Guided By Tests. While a bit Java and JMock centric, it has some good guidance around mocking best practices.

  • Bill Chrisie

    Derick, it appears to me that you can simplify your tests by simplifying your class(es). Your implementation of FluSectionResultGenerator violates the Single Responsibility Principle. I don’t read Ruby very well, but it appears to do three things:

    1. Calculate a score for a question
    2. Check a list of scores for susceptibility
    3. Format a URL

    You can argue that 1 and 2 are part of the same responsibilty, but that would still be two responsibilities.

    If you created a class called FluSusceptibilityCalculator and gave it responsibility for 1 and 2, then FluSectionResultGenerator only has to format the URL, and that will be easy to test.

    Make sure that FluSusceptibilityCalculator only does one thing and the test(s) for it should be simpler, also.

    That is the approach that I would take. It is the only solution that I have found that produces the desired result. I have also found that my biggest challenge is frequently how to come up with meaningful names for the myriad classes that I create.

  • It’s an old post, but we tackled the fixture problem in C# with method chaining… something we called fluent fixtures. It was pluggable so you could have database backed models or not.

  • I agree with Chris Geihsler above – you probably have an SRP violation that you need to untangle. I’m not fluent in Ruby (yet!) so I can’t offer an intelligent rewrite of your code but you might take a look at a couple of blog posts I wrote a while back illustrating a similar problem in C#:

  • > Should I be letting the actual classes run in this case, which
    > would require a fairly significant amount of setup data
    > including a number of .haml template files that need to be
    > read / processed?

    If lots of setup makes your tests less desirable, and if you’re trying to make your tests better, then changing one form of verbose setup for another doesn’t serve the goal.

    > What are the alternatives to this? How can I improve my
    > test, in this scenario?

    A few years ago, .NET’ers were just getting started with the excitement for auto-mockers. I asked the unwelcomed question: Couldn’t auto-mockers be a signal that there’s a missing abstraction? ie: If we have to coordinate so many objects to get something done, are we missing a coordinator object?

    It’s not a definitive indicator of a missing abstraction, but if it is a missing abstraction, then the solution is made of design rather than made of tools.

    The patterns to contemplate when you’re on the verge of wishing for an auto-mocker are “Role Interface”, and “Coordinator”.

    If this is a case of a missing abstraction, then the model is also not as truthful, clear, and self-documentary of its (business) behavior as it could be. You also might find that the specs (except for the setups, of course) reflect more of the business language than the implementation details.

    A solution should be given some real consideration when it:
    - Creates clear documentation of a defacto, actual system responsibility
    - Displaces a reliance on more tools to simple design
    - Is based on pattern that are already recognizable is an alternative that deserves exploration

    It’s easy to miss this possible solution because it doesn’t get emphasized by either .NET or Ruby frameworks and culture. One of the tradeoffs of using frameworks is that their principal design primitives (models, views, and controllers, for example) might eclipse other valuable patterns.

  • jdn

    I remember Scott writing on the board during one session at Alt.NET Open Spaces 2007 something like “Are AutoMocking Containers a Design Smell.”

    I agreed then, and agree more now. If you have to do a lot of setup, that’s a sign of a problem.

    “SHOW ME THE MONEY” to me, in this case, means perhaps one should take a step back and take stock of the situation and perhaps move in a different direction.

  • Good discussion here. The batch of related posts on this topic prompted me to type up some of my thoughts on the subject. It describes one approach to (attempting to) avoid the ‘complex setup’ code smell of which you are getting a whiff and desiring to avoid. It’s what I’ve found that works best for me so far.

  • I highly recommend to read the GOOS book – More than half of it is an almost real world example of how mocks should really be used.