Operation Bootstrap

Web Operations, Culture, Security & Startups.

Change Management and the Art of Going Fast

| Comments

“If everything seems under control, you’re just not going fast enough.” – Mario Andretti

I wrote an article or two about change control some time back. I called it “change control” because that’s what you were doing right? You were controlling change. Only, I didn’t really stop to think about what I actually was trying to control. Changes are a necessary part of any operation and when changes can’t happen that’s not good. Changes that happen fast aren’t usually bad unless something goes wrong. It’s not the change that you are fundamentally controlling. What you are actually trying to control is the risk of change and there are lots of aspects to doing that.

The race car driver going around the track isn’t trying to never slip or trade paint, she’s trying to be fastest and stay in the race. You can’t do that and guarantee you will never hit the wall – so you adjust your acceptance of risk to the point that you make good time around the track without getting knocked out of the race most of the time. Bad things happen sometimes, but you are doing ok if you come out ahead on average. Guys like Andretti are really good at this balancing act.

If you drive an ambulance, the math is completely different…

It Depends…

First, the all important “It Depends”. It always depends, you know this right? The risk of change for a 2 person startup is very different than the risk of change for a hospital or power plant. If a change happens in a bad way for the startup, maybe somebody gets woken up & has to fix a problem, or maybe there is some data loss, heck maybe the company goes out of business. In a hospital, the impact of change can be that people die. In a power plant, the impact of change can be that many people die.

Similarly, the risk of NOT making a change is different. The power plant wants to keep producing power in the least costly way – if a change isn’t contributing to that bottom line it shouldn’t happen. The risk of making the change outweighs the risk of not making the change.

A small startup wants to grow the business. Not making a change to them may mean not attracting new customers who want a new feature, or losing old ones who are waiting for that feature. The risk of NOT making the change usually outweighs the risk of making the change.

That said, the need to manage that change isn’t really all that different. Both the power plant and the startup need to manage it, they just need to do so in a way which is appropriate relative to the impact of change and their tolerance for risk. I’m going to focus on change management in the fast lane – mostly because I’ve never worked at a power plant.

What should a change management process DO?

To me there are a few things that change management should accomplish. These are high level things and they are the “what” not “how” of change management.

  • You should know when a change has occurred.

  • You should know what was changed.

  • You should know who made the change.

  • You should know why the change was made.

  • You should know what due diligence went into planning for the change.

  • You should know what the impact of the change was.

  • You should know that any obligations you are bound to through your customer/partner agreements or the law have been met.

Notice that this list does NOT contain.

  • You should know that a manager approved the change.

  • You should know that QA tested the change.

  • You should know that the change was discussed in a meeting.

  • You should know that the technical folks reviewed the change.

  • You should know that your partner approved of the change.

  • You should know that the change was completed within the change window.

That 2nd list is the “how” of change management and the appropriateness of those things depend on your tolerance for risk when making changes. The first list should lead to the 2nd, not the other way around. The first list always applies, the 2nd list might apply – it depends. Make sense? I thought so, so lets move on…

The problem with controlling change

In the companies I’ve seen, change control is applied in a very encumbering way. It’s a process that is separate from the change itself – administrative and bureaucratic. It is a control, and as such it is a gate. All things must pass through this gate before proceeding to the Ivory Tower. If they pass that gate, they are assumed good – this is flawed.

The process is also highly subject to double standards. Ops guys can get their changes through quick, Engineering changes get scrutinized under a microscope & rejected 3 times on principle just to make sure they are thought out. Is this because the changes Engineering submits are any higher risk than the changes Ops make? No, it’s usually because the person approving the changes knows and trusts the Ops guy. That same trust doesn’t often exist for the Engineer.

Does this actually control the risk of change? It might if you have the right checks in place and a lot of review. It might if the people doing the review take the job seriously, if they actually understand the change being made. The more people you add to this though, the less likely that is to occur. The more busy those people are, the less likely that is to occur. The guy who understands the change best, and who has put the most thought into it, is usually the guy who is making the change.

Now, in some worlds you want to introduce gates – lots of them. A power plant might intentionally put a 3 week required wait on any change to give it time to simmer, time for people to think twice about it, and to make damn sure that the person submitting it has planned it out because it’ll take another month if it doesn’t work. These worlds also often require that one group plan the change and another execute. This makes it mandatory that the change documentation be explicit & accurate. These are good things in some environments. Slow environments. Environments where the risk of change is much greater than the risk of no change.

Moving fast – manage it, but don’t control it

“What gets measured, gets managed” – You aught to know who said that…

He also said “Most of what we call management consists of making it difficult for people to get their work done.”

The book “Good to Great” talks about leaders & the characteristics that made them great leaders. One of those characteristics is knowing who should be on the bus and who should not. It’s about people. When you have the right people, the problems solve themselves – the red tape falls away.

Change management, if you want to move fast, should not be about controlling the actions of your team, it should be about measuring the success, failure and effectiveness of the changes. When failures happen, you analyze them and you find solutions to them. The solution is not “have another person double-check your changes”. Good people know how to do good work, they know how to test changes and they beat themselves up when a change goes wrong. You don’t need to help them with those things. What you need to do is give them the tools to understand the quality & effectiveness of their change. Few organizations provide these tools.

Look at the list of what change management should do above. None of those things require a lot of paperwork if your systems track changes. Consider this set of capabilities for a configuration management system:

  • Snapshot before/after state of a system

  • Document exactly what changed on a system between two points in time

  • Timestamp & record when changes to system configuration / state are detected

  • Track who executed a particular change

  • Return the systems to a previous state if required

Now add some monitoring / reporting on top of that which does the following

  • Produces a report showing usage behavior with ALL CHANGES tagged in the report so that you can correlate change in behavior to a change in your system.

  • Produce a report showing system performance, response time, load – again tagged with all changes so you can correlate change in performance to a change in your system.

  • Have a weekly review meeting to review the RESULT of changes made, how effective they were, what further tuning needs to be done.

What is mostly missing is the ability to track these:

  • You should know why the change was made.

  • You should know what due diligence went into planning for the change.

  • You should know that any obligations you are bound to through your customer/partner agreements or the law have been met.

“Why” is pretty easy – revision control systems deal with this nicely, you make a comment. You ask your folks to be verbose & you review the comments. Due diligence & contractual obligation is a lot about setting expectations – you document what changes need to be tested in what ways, what process to follow with customers & partners. If a change fails because it wasn’t tested – you make sure the person making the change understands what happened – if it keeps happening, they probably aren’t the right person.

So now, instead of approving/disapproving changes based on someones opinion about the risk of a change, you are reviewing facts about the changes & making future decisions based on that. The people making these decisions are stakeholders in the company, but they are empowering the individuals making the changes to select the best way to execute the change and holding them accountable for its success. Administrative overhead is low because the system tracks the changes itself & discussions about change are forward looking based on fact. You measure & you manage, but you let the individuals who know how to make the change control the change.

Do systems that do all this exist today? Absolutely, but they require investment of time & require understanding of your business. That understanding is very often left out when a change management process is defined.

Back to the race car driver – is anybody telling him how to drive that car? No. He has folks measuring his performance, telling him what’s going on around him, worrying about the things he can’t worry about when he’s driving. All that information is there to allow the driver to make good decisions but ultimately the driver and only the driver is responsible for his decisions. If he doesn’t win races, he doesn’t drive anymore. Simple as that.

When you need to layer on more controls

There will come a time when the ‘fast’ process isn’t good enough. This will usually be when changes become more complex, require more elaborate testing and require more careful communication and risk mitigation. These are good problems – they mean your business is growing and they mean the risk of change is beginning to outweigh the risk of not making the change. When that scale tips, it means your business earns revenue by being stable and your business processes have to change too.

Just be sure the scale has actually tipped, that the controls actually add value, before you layer on those controls thinking they will make things better.

Comments