Kanban in Software Development. Part 3: Andon and Jidoka – Handling Bugs and Emergency Fixes in Kanban

Let’s assume that we are doing the appropriate amount of testing during our development process. If we include TDD, test automation, test engineers and customer acceptance testing, we should find the majority of the bugs in our system before they are released. However, not every bug will be found. There will be some situation that no one thought about before. There will be some special circumstance on someone’s computer that hasn’t been accounted for. There will be some client data that does fit the expected variance, despite the data being valid. The point is, there will be something that breaks after we deliver the software. Worse yet – it doesn’t even take delivery to find bugs. What happens when the software gets to customer acceptance and the customer says that something is wrong, broken or whatever? We simply must account for the inevitable bug fixes and emergency patches in our system.

Our current kanban board, with all it’s pipelines and queues, hasn’t addressed the need to address problems. But before we get into any changes to the board, we have to define and create the culture of quality that we need in our team.

A Culture Of Quality

There are many different aspects of quality and many different ways to view and interpret quality. The individual specifics, as always, come down to your team, the project and the needs of both. Even with these variances, though, there are a few key concepts that should be found in any culture of quality, including Whole Team / Collective Ownership and Zero Defects.

Collective Ownership

I’ve talked about this concept before (see above link) and would recommend reading those prior posts. One thing I would like to add, though is that collective ownership can only success when the team takes their individual egos out of the equation. We must be able to accept criticism as a way to improve ourselves and our team. When we let go of our ego, we can be honest with ourselves and others, enabling everyone to own every part of the system.

Zero Defects

This is a subject that I have not talked about before, but is prevalent throughout the continuous improvement philosophies that I subscribe to. Wikipedia lists four principles of Zero Defect methodologies, all of which are paramount to our culture of quality.

  1. Quality is conformance to requirements
  2. Defect prevention is preferable to quality inspection and correction
  3. Zero Defects is the quality standard
  4. Quality is measured in monetary terms – the Price of Nonconformance

Don’t be confused by "requirements" in item number one, though. In the world of software development, the word "requirements" has many meanings – not simply the feature list or functional descriptions set forth by our customers. Our teams and processes have certain requirements that are imposed on top of, and into each of the business requirements for the software. These often include the use of source control systems, test driven development, principles like S.O.L.I.D., etc.

Quality Checks in Kanban

Collective ownership and zero defects are only the tip of the iceberg. There is so much more to a culture of quality and so many different aspects of quality to account for. However we define quality, though, our ability to create a culture of quality is necessary for our kanban system to work. Without it, we lose the continuous improvement and elimination of waste that defines Lean. With it, though, we can introduce some specific tools to our process and improve our kanban system by having it handle the errors that are found in our systems.

Andon

From Wikipedia:

"a system to notify management, maintenance, and other workers of a quality or process problem"

When an issue is found in our system, we need to notify the team immediately so that the problem can be taken care of. The idea in a manufacturing line is to allow a worker to stop the line when a problem is found. In our grocery store example, andon could be invoked by a shelf stocker noticing a loose imagenut or bolt on the shelf and then notifying a maintenance person. In software development, andon can take many different forms. We report our issues list during our daily standup. We create issue tickets in our issue management system. If we’re lucky and work in a company that supports the idea of a team room, we may just need to look up or turn around and let the team know that we have problems. We could even place an actual red flag on the desk of our developers, testers, customer representatives, etc., and have them raise that flag when they see a problem. Any of these ideas, plus many many more, can all be our andon system.

Andon itself is not intended to be an all encompassing electronic system with metrics and reports and yadda yadda yadda. It needs to be simple. It needs to be easily employed by anyone on the team. And it needs to mean something to the team. If a team member throws their andon card out on the table, the culture of that team needs to be ingrained with the knowledge that work may stop until the problem is addressed.

There’s so little to andon, yet so much more than what I’ve described. However andon is enabled, it is critical to our zero defect policy in software development.

Jidoka

AKA "Autonomation", AKA "Automation with a human touch", AKA "intelligent automation", AKA …

From Wikepedia:

"Autonomation prevents the production of defective products, eliminates overproduction and focuses attention on understanding the problem and ensuring that it never recurs."

I’ve talked about Jidoka in previous posts and would recommend reading them, at this point.

I don’t have much else to add, other than to say that Jidoka and Andon go hand in hand. Automated build servers (such as CCNet, TeamCity and many others) can often combine Andon and Jidoka for us by giving us instant visual feedback when something is broken. I’ve also recently set up BigVisibleCruise in my team area, giving even the casual observer the knowledge of whether our builds are broken or not.

Applying Andon and Jidoka to Our Kanban Board

Once we have the concepts of Andon and Jidoka in our team and culture, we can use these tools to generate issue cards and then look at a few possible changes to our kanban board to account for them. The three basic methods that I have seen used include:

  1. Creating an Emergency Fixes pipeline
  2. Tacking a smaller bug notice onto an existing card
  3. Putting a Bug card in the backlog

I’m sure there are other alternatives, too. In my current team, we use the first and third method and are considering the second one as well. 

Options two and three essentially require no change to our kanban board. Implementing option two is a distinctive way of visually attaching a bug notice to one of the cards in our system. This could be done with little bug stickers, little red cards, or any other visual indicator that the team agrees on. Option three also needs something distinctive. Since we are creating an entire card for the bug, though, the entire card should be distinctive. I would recommend using a card that is colored red to signify issues. I would also recommend prioritizing the bug to the top of the backlog queue, when using option three. This will ensure that the bug gets worked as quickly as possible.

Option number one can also make use of number two and/or three. When we move a card into the Emergency Fixes pipeline, we may want it to be distinguishable as an issue by tacking on our bug symbol or by using a colored card.

Emergency Fixes Pipeline

Depending on the complexity or severity of the bug, we may want to include Analysis and Customer Acceptance in our Emergency Fixes. With andon and jidoka in mind, we will want to ensure that we fix any emergency issue immediately. This is not always possible, however. The customer may decide to delay the fixing of an issue for whatever reason. This leads us to only needing a single pipeline for Emergency Fixes, letting us set our limit to one.

We can easily add an Emergency Fixes pipeline to our kanban system, placing it directly underneath our existing WIP pipeline. This special pipeline can be designated with a name, color code, or other marks as needed.

image

If you are supporting production releases, you may also need to include a Delivery queue specifically for emergency fixes. This would allow multiple fixes to be compiled into a single patch release. There are other configurations to this, of course. As always, you will need to find what your specific team needs and create your process to suit.

Where Do We Go From Here?

With an Emergency Fixes pipeline in place, our kanban system is now set up to handle just about every situation that we will encounter. However, this does not mean that our system is perfect or truly complete. No process, no matter how well defined it is, is worth anything if the people running the process do not believe in it. I’ve said it before and I’ll continue to say it – never stop improving your process. Always be mindful of waste, friction, smells, problems or whatever you want to call it. Inspect, adapt and continuously improve your process. Perfection is a journey, not an end-goal.

Stay tuned to my Adventures in Lean series, as I continue to explore the various aspects of lean software development in my own team and company culture.


Post Footer automatically generated by Add Post Footer Plugin for wordpress.

About Derick Bailey

Derick Bailey is an entrepreneur, problem solver (and creator? :P ), software developer, screecaster, writer, blogger, speaker and technology leader in central Texas (north of Austin). He runs SignalLeaf.com - the amazingly awesome podcast audio hosting service that everyone should be using, and WatchMeCode.net where he throws down the JavaScript gauntlets to get you up to speed. He has been a professional software developer since the late 90's, and has been writing code since the late 80's. Find me on twitter: @derickbailey, @mutedsolutions, @backbonejsclass Find me on the web: SignalLeaf, WatchMeCode, Kendo UI blog, MarionetteJS, My Github profile, On Google+.
This entry was posted in Analysis and Design, Community, Continuous Integration, Kanban, Lean Systems, Management, Philosophy of Software, Principles and Patterns. Bookmark the permalink. Follow any comments here with the RSS feed for this post.
  • http://scottic.us Scott Bellware

    Having a separate process/workflow system for bugs will create a situation where they are seen as different kinds of work. This can lead to bugs not getting a visibility commensurate with their priority.

    Andon doesn’t mean that the process for work changes, it means that the existing flow of work might get interrupted. This is a valuable side effect that suggests that bugs can – and perhaps should – go into the same process rather than a separate process.

    I have two kinds of work items in our kanban system: Work and Rework. We have a “New” state where all new items enter the process. We triage this state every morning.

    When a new rework item (bug, defect) is triaged, it will be routed to an appropriate state in the kanban – often “Ready for Development”.

    Rework is given higher priority and preempts any opening of work items until the rework is dealt with.

    Rework follows the same process as work. The presence of rework items in the work item flow affects the statistics of the work flow as it should). If these weren’t true or desirable, exploring an alternate pipeline for rework might be an option.

  • http://www.lostechies.com/members/derick.bailey/default.aspx derick.bailey

    @Scott,

    i lost track of what i should have been focusing on, trying to cram too much information into this one post. excellent clarifications!

  • http://www.lostechies.com/members/agilejoe/default.aspx Joe Ocampo

    I agree with Scott that the flow of work must be seen as the same but the classification of the work changes. I have employed a similar concept to what Scott has employed and it has worked out pretty well. You just have to work with your queues and reprioritize them. I do however have to caution that what ever WIP must be completed before moving to rework because if you don’t you will have created excessive motion when the developers reengage.

  • http://www.lostechies.com/members/derick.bailey/default.aspx derick.bailey

    @Scitt @Joe

    i really appreciate the feedback.

    after thinking about what you two are saying some more, I think I need to go back and re-think some of what we are doing and our processes.

    I generally agree that the context switching between work and rework is a bad idea. i do see that there might be times when it’s necessary, though. but i would guess that this is rare and the exception.

    i’m hopefully going to reorganize and rework this article (probably splitting it up) to account for the feedback.

  • Doug Evans

    I’m taking a hard look at Kanban for our product support which usually has a large backlog due to the nature of our business (we support many versions of our software and it is heavily customized).

    I’m taking a look at Kanban primarily to get away from the taskswitching mess that we often find ourselves in. The relevance of our situation to your article is in regards to the context switches when “Urgent” work enters the queue.

    For us this context switch has become important enough that we require management sign-off for urgent work. In other words management has to agree that the situations warrants forcing a task switch.

    I still think that your “Emergency Fixes” pipeline is relevant, but you need controls around what is allowed to enter it. In our business there are times when it is necessary for a developer to literally drop what he/she is doing. If that is your case then an “Emergency Fixes” pipeline with entry criteria probably does makes sense.

  • http://www.lostechies.com/members/derick.bailey/default.aspx derick.bailey

    @Doug,

    That is essentially how our emergency pipeline has evolved, as well. On a related note – one of key processes that allows us to instantly drop what we are doing, mid-coding, and switch over to a critical fix, is our “Branch Per Feature” source control method. I just haven’t had time write up a post on how we are doing this, and why.

  • Ajeet Nayak

    Hi Derick, I have a distributed team ( located in India and UK) What software can I use to create a digital board as complex as the one you have shown in this article.

    • http://mutedsolutions.com Derick Bailey

      check out http://leankitkanban.com/

      they have all the necessary tools to create complex boards like this, and more. it’s a great company, a great product, and constantly improving.