As the title suggests, this is part 2 of a two-part post which began last week with my story of getting stranded in a blizzard. As discussed more fully in that previous post, the postmortem analysis is a tool commonly used in IT which seeks to understand the causes of an adverse event (such as system downtime or a security breach) and identify ways to prevent those causes from recurring. That’s what we’re doing today.
Story (executive summary)
Last week’s story was about 6,000 words. Knowing everything that happened will still be helpful, so if you have time, definitely go read part 1, but to make it easier to focus this analysis, here’s the story distilled into a few bullet points.
Things that went well
- Neither I nor anyone else was hurt, nor did anyone suffer any serious property damage or financial losses or get in legal trouble. As they say, all’s well that ends well.
- The weekend was actually a fun adventure, and I learned many things that unfortunately can probably be fully learned only from the school of hard knocks, without causing any long-term harm. Maybe I’m a weird person, but I enjoy the ride so much with things like this that I hardly ever truly regret any past decisions, even if looking back I know they were mistakes.
- I have an emergency fund set aside for situations like this, which meant I was able to happily charge any expenses that made sense without worrying about how I was going to pay for them when I got home. I was still in good cheer after paying $300 at the towing office.
- I didn’t panic or feel seriously stressed at any point.
- I found my way from the sheriff’s pickup truck, without my glasses or cell phone, to a safe, comfortable place with people who took care of me in about three hours, just by staying calm and cheerful, spending about $50, and using a little bit of imagination and human ingenuity. One person I told this story to in the last couple of weeks said that I should add this to my résumé as “project management experience”!
- Everyone I spoke to or spent time with was helpful and nice. Most of us try to help people who are stuck because we know someday we’ll be there too. Today was my day, and people did not disappoint.
Things that went badly
- My car got stuck on the highway in a blizzard.
- I spent more than $500 which I shouldn’t have had to spend.
- I spent a weekend stranded in a town I didn’t know well without my glasses, phone, or pretty much anything else, missing a day of work.
Things that came close to going badly
- My car could absolutely have been seriously damaged (by someone not seeing it in the blizzard and slamming into it, or by being towed improperly, or by getting in an accident from driving in conditions I shouldn’t have been driving in before I got stuck). That would have led to a nice insurance claim – likely causing long-term damage to my rates – and a whole lot of trouble.
- While the overall chance was still low, being stuck in the middle of nowhere during a blizzard is a dangerous proposition. If I had gotten separated from my vehicle or it had stopped running, and I had been stuck for hours, I could have gotten frostbite or even died of exposure.
- As I planned to sit in my car, I did not think about needing to periodically ensure my exhaust pipe was clear. This could have resulted in carbon monoxide poisoning had I had to stay in the car for longer without remembering this rule.
Incidents like this (it seems inaccurate to call this one an accident) are barely ever caused by a single mistake, because most dangerous systems have guards of some kind that prevent a single mistake from being catastrophic. Even something so obviously dangerous as texting and driving rarely has consequences by itself; instead, the crash occurs when it’s dark and raining, one driver is stressed out, running late, and texting someone to explain where they are, and another driver is following too close and can’t stop in time when the first driver does something stupid due to being distracted. The contributing factors form a kind of chain of causation, often with five or more links, and eliminating just one of the primary causes breaks the chain and there is no accident. When removing less important contributing factors, the consequences are usually at least somewhat mitigated, and removing several of them might prevent the accident.
Therefore, in a good postmortem investigation, we identify as many factors as we can that combined to cause all the bad or nearly-bad things that happened. If we can reduce or better yet eliminate the chance of just a couple of them going forward, it becomes unlikely we’ll ever end up in a similar situation again.
In a traffic accident, investigators usually have to find a single root cause and hold someone partially or completely at fault for legal purposes, but here I won’t try to reduce everything to a primary cause or assign blame to any particular person or factor. That’s not only because in an organization this results in a culture where people act to avoid being “blamed and shamed,” even if that’s not in the best interest of the organization (e.g., covering up an accident or a near-accident that could have served as a learning experience), but also because in the end it just plain isn’t that helpful. What does it matter whether this incident was “my fault”? Either way, I really only care about how I can prevent it from happening again.
Here I’ve separated the list into factors that were out of my control and factors that were under my control. Hopefully it’s obvious which of these two sections will be most used to suggest action items for future improvement – but the out-of-my-control section also helps by allowing me to understand the context that led to choices made in the factors-under-my-control section. In addition, noting the factors that were out of my control can help me recognize when conditions are ripe for the making of bad decisions in the future and be more cautious.
Out of my control
- There was a nasty blizzard.
- In the run-up to the blizzard, the area had recently received so much snow and winter weather that our perception of dangerous weather was altered and we didn’t take the blizzard seriously.
- I had an out-of-town responsibility (singing at a choir concert) during the blizzard.
- The concert was not canceled on Saturday night before I left Owatonna (given the circumstances, in hindsight, it probably should have been).
- I had to check out of my hotel at approximately 11:00am, rather than, say, 9:00pm, so I was inclined to try to go somewhere at that time.
- The 511MN road conditions map incorrectly made it appear that conditions on 218 weren’t as bad as on other nearby roads.
- There was a gigantic, nearly invisible snowdrift in the middle of the road for me to run into.
- The sheriff arrived and gave me a ride before I had had much time to sit and reflect on my situation and what I should do next. (If an extra half hour had passed, I would probably have at least not forgotten my glasses and phone when I left.)
Under my control
- I chose to drive – unnecessarily – during a National Weather Service warning that said travel would be “extremely dangerous or impossible.” I was tired, felt like I had come for no reason since the concert had been canceled, and wanted to be at home during the blizzard instead of in some town without a place to stay, and I let that take priority over my safety.
- I failed to use the knowledge that should be ingrained in me as a Northerner that conditions change rapidly during blizzards and just because things look fine now doesn’t mean it will be safe to be out there in 5 minutes.
- I deceived myself about how bad the roads were as I began to set out and took an unnecessarily long time to decide I needed to turn around.
- I had no formal plan in place for, nor had I ever really even thought about, what I would do if I got my car stuck in a blizzard. This meant that when I was stuck on the highway, I was making everything up on the spot, while under time constraints and in an inappropriate mental state to be doing so.
- I chose to be away from home during a storm that could potentially be serious without reflecting on the consequences of this choice.
- Before leaving on Saturday, I did an insufficient amount of planning for potentially being stuck or gone longer than expected. I did put the card with my work team’s phone numbers in my wallet before I left, and I brought an extra change of clothes, but I should also have (1) set clear guidelines about the conditions during which I would consider it safe to drive back; (2) shared my plan with someone else, who would have helped keep me accountable and wait patiently; and (3) ensured I had a copy of useful contact information available, such as the phone number (or at least the last name!) of someone who lived in town and could help me out if I ended up in a jam.
- Because of the lack of complete planning, I was overly willing to make decisions on the fly that are subject to bias and wishful thinking (e.g., when it was safe to drive home).
- I had barely any way to contact anyone who would be helpful when separated from my smartphone.
Now that I have all the causes listed out, or at least as many as I’ve managed to identify, here’s the ultimate goal of the postmortem: what do those causes teach me? What can I change to reduce the chances of this adverse event or something like it in the future, or reduce the impact should it happen anyway? I want to find actions that will reduce the impact of contributing factors out of my control, or prevent or negate factors within my control.
- Institute a personal rule that I simply will not drive during a blizzard warning under any circumstances (unless there’s a legitimate emergency). Doing that is just plain stupid, and if I’m tempted to break the rule, I now have experience to remind me that it’s a bad idea!
- Consider the impact of purposefully leaving home in advance of what could become a minor natural disaster more seriously in the future. If choosing to proceed, come up with a complete plan that takes pessimistic timelines into account, be willing and mentally and logistically ready to be stranded for the entire length of that pessimistic timeline, share the plan with other people and use them to confirm its plausibility and safety, and don’t change the plan without solid reasons to do so.
Concrete action items
- Add some more local phone numbers to the reference section in the front of my pocket notebook, which I had the whole time I was stranded. (I have some already, but don’t always do a good job of keeping it updated.) Further, consider memorizing at least a few local phone numbers in case even that goes missing.
- Think about and add a page to my binder of emergency plans
indicating a procedure for getting stuck in a car during a winter storm,
- Items to keep in the vehicle. I’m generally well-prepared in this area; I have a list of about 30 useful things I keep in my car at all times, including jumper cables, nail clippers, floss, a towel, rope, extra winter clothes, a complete atlas of the state, and even tablets for preventing thyroid cancer after a nuclear disaster (I bought those during the last North Korea scare… maybe they’re a bit paranoid, but they’re cheap, small, last for years, and experiments during Chernobyl found them to be highly effective at neutralizing a serious health threat in that unlikely event). What I didn’t have was a bag or anything to bring with me if I had to leave the car. Had I had this, I would have been much more comfortable over the weekend, and I’m confident I would have also thought about my phone and glasses as I thought about grabbing my emergency bag before leaving the car.
- What to do and bring before abandoning the car when appropriate.
- Tips for safely attempting to get the car unstuck.
- What to do to stay safe if I can’t (e.g., keeping the exhaust clear).
- Make a duplicate copy of my emergency plans binder and keep it in my car. Currently I only have a copy in my apartment, which means it wouldn’t have been much help even if I had had plans in there. (Of course I would have been in a much better position already simply by virtue of having thought about it enough to compile the plans. But having the plans would certainly help more!)
Applying postmortem analyses yourself
Most of us, after living through an incident like this one, end up thinking about what went wrong. But how often do we actually think through all the causes and make a logical decision about what we can change in the future? We tend to pick out something to pin the blame on and stop right there, often without realizing we’re doing it. Sometimes it’s as simple and unenlightening as “I’m stupid.” (I hope I’ve conveyed that, even if true, this is a completely unhelpful statement.) Formalizing the analysis and working through all the parts of the incident yields far better results.
An effective postmortem analysis has three parts.
Part 1: Story
Explain what happened, as completely and honestly as possible. You don’t necessarily have to write out a complete narrative as I did (I did because I hoped being able to get a closer look at the details would help people who weren’t there understand the analysis – plus it’s just a good story!). Do, however, write out something like my executive summary. It’s important to really write it rather than just sitting there thinking about it; writing something down makes it feel much more real, you’ll probably be more honest when you put it on paper, and you may recognize contributing factors you hadn’t thought of before as you write it out.
Remember that a proper postmortem analysis is blameless. Blame might feel good, at least when you can pin it on someone else, but it never fixes anything, and it does cause emotions to get in the way of an accurate analysis. If you screwed up, no problem – you’re not getting blamed, just tell it as it happened!
State all of the facts as facts, without invoking any kind of value judgment. If you or someone else made a bad decision, state what decision was made, what went wrong as a result of that decision, and why you think that decision was made. If something out of your control happened or contributed, simply state that it happened – even if it wouldn’t have been a problem were it not for a factor in your control, the fact is that it contributed and was part of the problem.
Part 2: Causes and contributing factors
Look over your story and list out everything you can think of that contributed, whether it would have broken the chain in itself (e.g., if I hadn’t had plans out of town that weekend, I would have stayed home looking out my window at the pretty snow and drinking hot chocolate and nothing else would have happened) or merely nudged something a little closer to happening (e.g., the road conditions map I looked at when I left Austin painted a picture that was slightly too rosy).
Part 3: Actions
Identify straightforward things you can do to break the chain in the future. If those involve tasks you need to complete, put them on your to-do list and prioritize them as appropriate. When you’re done, you can stop thinking about the past and rest easy knowing that you’ve done all you can to prevent the incident from recurring.
Examples of other good analyses
In case you need examples other than computer hacking or blizzards to see where you could benefit from this technique, here are three more:
- A near-accident when driving or cycling. Near-accidents make fantastic learning opportunities because they happen far more often than actual accidents but still provide lots of information on contributing factors that can and should be mitigated. Unfortunately, most people waste the opportunity by just saying Phew and forgetting about them. I can recall many of these near-accidents, but here’s a particularly good example: I once came within feet of rear-ending someone on the freeway when they decided to make an illegal U-turn from the left lane at 75 mph without signaling or tapping their brakes first (I also happened to be looking down at my speedometer at the time they slammed on the brakes to make the turn, losing me at least half a second of reaction time). Looking critically at the situation led me to the realization that, while the guy ahead of me was indeed an idiot for making that turn, I was also following way too close while passing someone in the right lane. Had I had another two seconds to react and brake, there would have been no issue. Since then, I have always made sure to leave the same (adequate!) amount of following distance while passing as at any other time, and I notice that most people form dangerously tight lines while passing.
- Instructions followed incorrectly. While not nearly as dangerous as our other examples so far, ruining a batch of cookies or having to disassemble your IKEA furniture and put it back together again 20 minutes after making a mistake is sure annoying. Work through why that happened. There were at least some factors under your control: what can you change to prevent them from recurring? Note the difference between blaming yourself and trying to prevent the problem from recurring. The primary issue is likely someone else’s bad design, so don’t just label it “user error” and call yourself stupid, but you can’t fix someone else’s bad design. You can change your process or write corrections or admonitions on the recipe, so do that.
- A botched project. That could be as large as a multi-million-dollar corporate initiative or as small as cleaning up your living room. Perhaps you were unsatisfied with the results, or it took five times longer than you intended, or it went way over your budget. You probably have some choice words for other people or products involved, but don’t stop there. What actually happened? What can you change next time?
The ability to learn from a mistake is one of the most distinctive and valuable human skills. But it’s all too easy to fail to learn anything at all if you don’t go about it in the right way; you might just blame the mistake on someone else, or focus on a tiny portion of the problem and ignore a much bigger problem that would be easier to solve! Postmortems help us to cut through the bias and consistently learn.