Busting myth about Kanban suited for maintenance only

It's commonly accepted in IT management circles that Kanban Just-in-time is suited mainly for projects past active development. But it seems that most people simply repeat this statement without giving it a good thought - why is this true, is it actually true? I'd like to explain how this statement came to life, and show that it's not only possible - it's easy to apply JIT to projects that are being actively developed.

Origin of the myth

The idea that JIT mainly suits maintenance-stage projects was born in book Kanban - Successful Evolutionary Change for Your Technology Business written by David J. Anderson. According to the book the process mainly boils down to such algorithm:

  • StepA finishes a task and puts it into StepB buffer
  • If StepA works faster than StepB the buffer fills up (let's say it has 5 slots), after that StepA has to stop
  • Once StepB takes a task from the buffer a slot frees up. This unblocks StepA until the buffer is filled up again.

Just-in-time: expectations

Simple in theory, but here is a problem that the author ran into:

  • Suppose that the buffer is full: StepB works on complicated tasks at the moment and blocks StepA for a full week
  • The tasks in the buffer are simple, so once StepB is free to work on them - it finishes all of them in 1 day
  • This unblocks StepA and people start working on new tasks - but those are complicated and will take a week

As a result StepA was idle for a week, and then StepB becomes idle for another week. So we just lost a week - if StepA kept working it would've provided new tasks to StepB immediately. Oops.

Just-in-time: reality

The root of the problem lies in how we measure buffer size. In our case we capped it at 5 tasks, but the tasks were very different in duration. If all the tasks were of the same duration both steps would've synchronized. They would not be both idle ref.

Therefore the book says that we should do our best to standardize the tasks in terms of duration. And because maintenance tasks are usually easier to standardize and they are more predictable the outcome is: JIT is bad for projects in active development and is good for support phase.

Fix #1 - change units of the buffer

The natural way of capping the buffer is by counting the tasks, but is it the best way? We'd have to standardize the tasks to make it work, but this is a very (very!) complicated problem. It requires us to estimate with high level of precision, and then split or combine tasks if they are not standard. This is a lot of work even for support-phase projects.

Instead we could measure the buffer in hours/points/whatever you estimate in. This way the tasks could be of different sizes and the max number of tasks in the buffer can also be different. In our case above this would allow to finish not 5 but say.. 8 tasks and only then halt StepA. If you want to be more accurate you'd want to estimate tasks for StepA and StepB separately (some tasks may be simple for StepA and complicated for StepB or vice versa), but this is more time consuming.

Also instead of estimating you can simply employ your gut feeling. Usually team leaders understand which task is complicated and risky and which one is simple. Such person can decide whether we want new tasks to be pulled out of backlog or we should wait.

Fix #2 - move the constraint upstream

Adding extra Test engineers or removing some of the Dev engineers is a simple and beautiful solution to the problem. But in order to understand how this helps you need some background in the Theory of Constraints... The difference between JIT and ToC is pretty subtle:

  • JIT tries to optimize the whole process. It suggests that each subsequent step controls the amount of work previous step can do. And at some point we'd like to balance the performance of all steps.
  • ToC recognizes that in each system there's always a bottleneck (constraint). This isn't bad and we shouldn't necessarily try to eliminate the constraint ref. Instead we should use the constraint to set the rhythm of the whole system.

From ToC perspective no steps in the system must work at 100% except for the slowest step - this step defines the overall performance. And if other steps also work at 100% then they produce work that can't be consumed by the constraint timely and thus Inventory Costs (unfinished work) grow. Which in turn slows down the whole system.

In systems where constraint is loosely defined and can migrate from step to step it's hard to figure out which of the steps is the most precious. In such cases it's too easy to start wasting these precious resources on some unimportant work that could've been done (or prevented) by others.

So the constraint must exist and it's good if we know it. Unfortunately things are more complicated if the constraint is in the middle or at the end of the process as we somehow should notify all prior steps that they need to work more (or less). But if we put the constraint at the very beginning - no additional orchestration is necessary. Our bottleneck simply produces less work than can be processed by all subsequent steps. In the world of manufacturing there's usually some technicality that defines the rate limiting step, but in Software Development we often have more control over this - we can simply add or remove people from some steps.

Things get even simpler because in software projects there're only 2 primary steps - Development and Testing. Others are not as crucial (in terms of the buffers):

  • BAs, UX - could sometimes become bottlenecks. But usually they can whip up a feature or two by request relatively quickly. So oftentimes we can omit thinking about their performance.
  • Release & Configuration management - these days is mostly a solved problem and can be automated.

So what's going to happen if we have redundancy in Testing? Its buffer will almost always be empty! So the problem of big vs. small task is not a problem - once a task is implemented it can be accepted into Testing immediately. This has additional benefits:

  • Testers have free time and in addition to functional testing can think about UX, security, performance, etc.
  • Developers receive feedback very quickly while they still remember what they developed

Note that when there're only 2 steps in a system both JIT and ToC become equal things.

Summary

We learned that Kanban JIT has issues when the amount of work varies greatly from time to time. And this is probably one of the reasons why Lean sees standardization as a an important part of Continuous Improvement. Ideally we should have smaller tasks and push them to testing frequently which would certainly alleviate the problem.

But this is not always possible or optimal. And since in Software Development we're not bound by the constraints of the physical world, we can move around resources and decide where the bottleneck resides. This makes it relatively easy to fight with variability - by employing principles from Theory of Constraints we can make the 1st step in the system the slowest in order to guarantee empty buffers and hence low Inventory Costs.

Related articles

Popup Notes:

  • Some Theory of Constraints wisdom

    Actually one of the steps can and MUST be idle from time to time. This simply shows that one of them works faster than the other. No matter how we try this is going happen unless you want to abandon the ideas of JIT and ToC and use a Push system (which is way less effective).

  • Constraints always exist

    We can't really eliminate a constraint. It's possible to optimize the constraining step but this will simply turn another step into a constraint. Such back-and-forth creates chaos which we don't want.

X

Title