When we encounter a problem with a complex system, it’s easy to misidentify the problem. In other words, we can seize on some small piece of the system that apparently doesn’t work, spend minutes or even hours investigating that piece in great detail, only to eventually find that piece had nothing wrong with it but only appeared to because of the influence of some other related component. Judicious application of the Minimal Working Example technique reduces the amount of time we spend on misidentified problems – and not only that, it nearly always leads directly to the actual problem.
An MWE is minimal, because it cuts out anything irrelevant to the problem; it’s working, because it can be reliably tested for the presence of the problem at every step; and it’s an example, because it’s not the actual component being debugged but leads to insights that can be applied to it.
Here’s the basic idea of the MWE: we want to rip our problem out of context. “Out of context” has a negative connotation most of the time – after all, most things, whether they be sentences in a natural language, historical events, or parts of a computer system, contain far less information than we think they do and require context we supply automatically and unthinkingly to carry their full meaning. It’s only when the context goes missing or is intentionally replaced with the goal of misleading that we realize how little of the meaning was actually inherent in one component.
But when debugging, it’s having the context that leads to errors, because there’s so much information available we’re liable to focus on the wrong piece of information. The bug is inherent to one component somewhere, and we need to figure out which one, and severing that component from its context can help to locate or confirm it.
Digression: An amazing demonstration of the amount of meaning carried by context comes in the poetic form called cento, in which lines or half-lines of a famous epic (usually Virgil or Homer) are rearranged without modification to form a poem about a topic only vaguely related to the original. Reading cento whose source material you know is a bizarre, dream-like experience where the inherent meaning of the lines and your memory of the original context collide with the new context (which provides the surface meaning). Cento is largely a classical phenomenon, but “found poetry” is a similar modern phenomenon – though I don’t find these modern variants nearly as powerful or demonstrative, perhaps mostly because they rarely borrow from such well-known and masterful sources.
There are two ways to go about building an MWE: top-down and bottom-up. In the top-down approach, you start with the original system and gradually remove the context until the problem goes away or becomes obvious. In the bottom-up approach, you yank a tiny part of the system you think ought to work and drop it into an entirely new system, then add context until the problem appears or becomes obvious. Both approaches have their advantages and uses; the top-down approach is less sensitive to the quality of initial guesses about the problem, but it usually takes longer.
Let’s look at how the top-down approach works first: it’s more methodical and involves less guesswork, so it’s easier to follow and requires less guesswork to apply. Say we have an application with 50,000 lines of code and something isn’t working. First we make a copy of the application so we don’t mess anything up. Then we start deleting code. We might begin by deleting everything except the one screen that’s exhibiting the bug (this may require adding a little bit of test code so we can still reach that screen when we start the application). Then we test.
If the bug isn’t there anymore, we know some strange interaction between this screen and the rest of the application is causing the bug. We can either switch to a bottom-up approach and begin gradually adding the context back again or go back to the full version and be a bit more conservative about what we remove.
If the bug is still there, great! We’ve now determined for sure that the rest of the context is irrelevant to the problem. Next we might start removing elements from the screen that’s broken and testing periodically, and so on. If this process doesn’t directly lead us to the bug along the way, eventually we’ll be down to just a few lines of code, at which point the issue will likely become obvious. And if we still can’t figure it out, we have a beautiful, easy-to-understand little example to hand off to someone more experienced, who will have a far better chance of finding the issue and be much more inclined to help when the problem has been demonstrably reduced to a tiny example.
The bottom-up approach works similarly, except that it involves an initial guess about the source of the problem. In the example above, we might instead begin by building a completely new application with a new screen containing five elements that does what the original code was supposed to do. If building this example from scratch doesn’t in itself expose the bug (it often will!), we can test. If the problem still occurs, we can switch to a top-down mode or pass the problem off to a more knowledgeable person as needed. If it doesn’t occur, again, we know some context is unexpectedly causing the issue and we can start adding it back in until the problem shows up.
Creating a Minimal Working Example can prove useful anytime you (1) experience a problem in a complex system, and (2) have the ability to remove or simplify some parts of that system or otherwise alter the context. Most systems that meet these criteria are digital or otherwise obviously technological, but there are exceptions, and you certainly don’t have to be a programmer to work with them.
Just today I discovered that my old Kindle was no longer connecting to wireless. Before I decided it was dead, though, I wanted to make sure I had actually identified the problem correctly. Amazon’s Kindle 3G coverage is provided by Sprint, and my town has notoriously bad Sprint coverage, so I picked up my Kindle and physically eliminated the context of my apartment’s terrible radio wave permeability by going for a drive around town, pausing occasionally to see if it picked up any bars or managed to sync. (It didn’t. It was conveniently Amazon’s Prime Day sale today, and I got a new one at 50% off, with the added assurance that I actually needed a new one and wasn’t just getting convinced to buy unnecessary junk by Amazon!)
Doing math is a less obviously techy example. Have you ever had trouble completing some kind of calculation or thought your result was wrong? Did you go back and try the same process with easy numbers (say, 10, 2, and 1), to verify that your process was right? By making the numbers easy, you can eliminate the context of arithmetic errors and see if your formula or method is the actual problem.
Here’s an occasion where I didn’t apply the technique and I absolutely should have. My entire computer screen suddenly went green – no other colors. I first assumed the cable was loose where it connected to the monitor, but it was securely fastened. I tried the monitor with another computer, but it worked fine there. I then concluded my graphics card had to be fried, so I ordered a new one for about $100 and put it in and… everything was still green. It turned out that the monitor cable was damaged (which I fixed for $5 in 20 minutes by driving over to the local Best Buy and buying a new one). While I did check several components, my search was not methodical and didn’t start from the top down or the bottom up and go all the way to the other end, so I didn’t manage to pinpoint a single component before I went to the comparatively tremendous bother and expense of replacing the graphics card (which didn’t even fix the problem).
Creating an MWE can seem time-consuming and high-effort, so the technique tends to see far more limited application than it deserves. If you find yourself waffling on the value of creating an MWE, remember how much time – or money – you stand to lose if you make a mistake.