For the last decade we’ve learned how to build products the right way. That’s great. It’s now time to ask: are we building the right products?
“There is nothing so useless as doing efficiently that which should not be done at all.”
If I were to ask you how sure you are that you’re building the right product — or feature — right now, chances are that you’re quite confident. You’ve done your research, talked to customers, talked to your team, and know the technology.
But what if I tell you that one third of what we build fail to show value. Another third has a negative impact on the value. And the last third may have a somewhat positive effect towards what you aimed for.
… let’s make a bet¹. How much are you willing to bet that what you’re building right now will show the value you hope for? Your lunch? That’s probably fine. What about a week of vacation? Or your car? Maybe your house even? Not enough? Alright, let’s bet your retirement savings.
Maybe now you’ve started to falter in your conviction that what you’re building absolutely will bring about the desired value. Personally, I’d say that’s a pretty good place to be. In the complexity of building products for other people we shouldn’t be too sure (and assume) that we’re on the right path.
So what can we do?
Experiment of course! List your assumptions, figure out what you need to learn, what decisions you need to make, what information you need to make decisions, and start experiments to get those answers, lower risk, and raise your true confidence.
There are times when heavy post-mortem examinations and reports are the right thing to do. For example, I assume that Google’s post-mortem work after the recent email phishing campaign went pretty deep.
However, most of the time a more lightweight post-mortem examination and report may be quite a bit more appropriate and much less time-consuming.
When an incident has occurred, there are three things that should be discussed: 1) why it happened, 2) problem prevention and 3) time to fix.
The right people
First, gather the people that has an understanding about the incident and the know-how for improvement. Secondly, facilitate a discussion about how and why the incident happened. Thirdly, decide which people are needed for problem prevention and then who’s needed to improve on time to fix.
It’s not unlikely that problem prevention and time to fix yield different people.
Why it happened
Although we’ll use root cause analysis to identify and create understanding of the faults or problems that led to the post-mortem, we’re not really looking for ‘root causes’ so much as identifying where improvement would prevent the broadest classes of problems going ahead.
We remind ourselves, before we start, that while someone probably made a mistake at some point, we all somehow built a system that let that mistake happen. This is an opportunity to improve that system — through additions of, or changes in the likes of, processes and policies.
Only after reminding ourselves about no blaming do we start working towards a shared understanding about the incident why’s and how’s by first framing the “bad thing” that happened in terms of customer pain.
To move blame from a person to the system it might be useful to use “hows” instead of “whys”. (e.g. how did this happen, instead of why did this happen.)
After framing the incident, we explore it further by asking why and how the incident happened, until it doesn’t feel productive anymore; for example:
What? Customers’ found it impossible to sign up for our service.
Why? They got a validation error saying no territories were selected.
Why? Territories were visually selected, but not populated into post data.
Why? We include the latest version, which was updated with a bug.
You can stop here, or your can continue if you’re comfortable in finding organisational problems you’d like to address, e.g. this might have happened because of lacking Task Relevant Maturity and training.
Now that we have a working understanding of the incident, we can figure how to be alerted faster of similar incidents in the future and come up with procedures to try and avoid it from happening again.
Problem prevention is all about finding ways to make sure some problems does not happen again, and it’s fairly easy to do.
First, find a proper level from the root cause analysis to start with.
In the example above, the first two items aren’t useful when it comes to problem prevention, but the other three are.
Then ask “how might we prevent this problem from happening again?”
Our third item, territories were visually selected, but not populated into post data, is something we can prevent in the future through automated tests.
We’re not likely to do anything about the forth item, it’s probably going too far to stop using third-party libraries in our application. However, the fifth point is something we can avoid by always specifying a version to use.
So, by upping our automated tests game and adding policies regarding third-party libraries into our current processes, we can prevent similar problems from occurring again.
Time to fix
Time to fix focus on reducing the time between when an incident occur to when it’s first fixed. Contrary to problem prevention this usually involve alerts, monitoring and procedures surrounding those, e.g. on-call shifts.
Time to fix start the same way as problem prevention do: find a proper level from the root cause analysis to start with,though it’s likely not the same items as during the problem prevention session.
Then ask “how might we act faster on similar problems in the future?”
With the example above, we could end up with things like: alert the person on-jour if new signups are lower than expected, or prioritise exceptions and unlikely hight ratio of validation errors from signup related flows in the app.
That’s it, you should now have a good-enough root cause analysis, logical actions for preventing similar problems in the future and a plan for what needs to be done to improve upon the time between when an incident occurs and when it’s fixed.
There’s a great article by Harvard Business Review called Strategies for Learning from Failure which talks about the blame game and how not all failures are created equal. It also introduces a a spectrum of reasons for failure which is a useful reference to look at when an incident happens.
Take responsibility. Ask, listen, and ask again until you understand.
“Most personality conflicts at work arise from the fact that people do not know what other people are doing and how they do their work, or what contribution the other people are concentrating on and what results they expect. And the reason they do not know is that they have not asked and therefore not been told.”
In Managing Oneself, Peter Drucker argues that it’s each persons duty to take responsibility for relationships at work — and that it has two parts.
“The first is to accept the fact that other people are as much individuals as you yourself are.”
“That sounds obvious,” he adds. And, indeed, it does sound obvious but our brain is so annoyingly focused on ourselves that we might as well need the reminder; again, and again, and again.
So here’s my reminder to you, if you feel that you have a conflict with a person at work. First, defuse the situation by setting your mind to the fact that almost everyone does their best (given the circumstances) and wants to help. It really is our default mode. Then, when you’re ready, talk to this person—and ask your questions, be curious, and dive in deep.
Most likely, you’ve had it all wrong anyways—as we usually are with the stories we tell ourselves about others.
And some more to consider:
Manners are the lubricating oil of an organisation. It is a law of nature that two moving bodies in contact with each other create friction. This is as true for human beings as it is for inanimate objects. Manners — simple things like saying “please” and “thank you” and knowing a person’s name or asking after her family — enable two people to work together whether they like each other or not.
Do your part for respect, trust and courtesy at work. The results are worth it.
Oh, and the second part of relationship responsibility, according to Peter Drucker, is taking responsibility for communication.