C9D9: Supporting Development Velocity Downstream

A few weeks ago I participated in a panel for Continuous Discussions organized by Electric Cloud where we discussed how organizations can support development velocity downstream of development teams. You can watch the discussion to get everyones thoughts but I wanted to elaborate a bit here on my own thoughts around these topics.

You can watch the discussion here:

The general idea in this discussion was that while we can optimize development workflows to improve velocity for those teams, work often gets held up downstream of development during testing or deployment into production. We wanted to focus on what can be done to improve downstream throughput.

In general the topics covered were as follows:
* Culture and Team Structure
* Value Stream mapping
* Optimize inter-team handoffs
* Automate³
* Version & share binaries, environments and processes
* Blameless post-mortems

My $.02 starting point

As an industry we’ve become really good at making things faster & more iterative but we haven’t yet figured out how to make light move faster. Most work toward improving the speed of a thing involve eliminating waste – improving efficiency & effectiveness. You’ll probably hear this theme over and over in my responses below but the simplest way to improve velocity through a series of steps is to eliminate steps & let computers do the ones you cannot eliminate. It’s not always possible – but should alway be considered a possibility.

Culture and Team Structure

This, in my view, is the most important aspect of the overall velocity of development. Assuming that we view “development velocity” as the time it takes from an idea being prioritized so that someone can spend time on it, to the time a customer can touch it, then we have to consider the entire delivery pipeline when we evaluate our velocity.

If we look at an entire delivery pipeline and all the handoffs that might occur, there are a lot of opportunities for waste. We could focus on making handoffs more efficient or we could focus on how we automate them – which often mostly eliminates the traditional idea of a handoff. This is usually a core objective of Continuous Deployment.

The most effective and high velocity team structures I’ve seen are those where developers have the ability to make changes which have a fully automated pipeline to production. Typically putting something like this in place requires that the development team be a cross-functional team with development, testing, product, operations and leadership expertise. The more of those functions you remove from the immediate team, the less efficient the team becomes. Whether that means they have lower velocity or are less effective depends on the organization – but there’s a good chance those traits are impacted as well.

When a development team has representatives from the areas of the business they need to interact with on the team & have an ability to rapidly experiment with changes to their software in a production environment then their ability to make accurate and timely decisions about product changes is vastly improved.

Value Stream mapping

Watching the video you’ll see that I don’t spend much time on this topic because I’ve never really done much of it. I have, however, listened to others talk about how effective it can be at bringing groups together to talk about and understand what happens at each stage of a delivery pipeline. Again, I refer to a “delivery pipeline” as the entire process of getting some idea into customer hands. If you are a software company, your delivery pipeline includes nearly your entire organization. It is very rare for a single person to have a complete and accurate view of all the steps that occur in this process – so bringing folks together to document the current state of things can bring about some surprising results.

Have you ever watched the making of Jelly Belly beans and thought “Holy Cow, I had no idea there were so many steps in making a Jelly Bean!”? You might be surprised how little you know about your own processes. You might also be surprised to find out who in your organization is crucial to those processes completing correctly & on time. Are those folks doing ok?

Watch Jelly Bellys get made!

Optimize Inter-team handoffs

As I mentioned above, my first approach to this problem is to eliminate the handoff, but this isn’t always possible. When it isn’t possible, the above value stream mapping exercise can help you understand what opportunities there are for automation. Often the most time consuming part of any handoff between groups is discussion & establishing trust. If I am receiving a thing from a group I trust a lot & I have received things from them in the past with great success, I’m probably not going to require a lot of discussion. On the other hand, if every thing that group has handed off to me has erupted into a tire fire, then I will probably have some questions. Worse, my distrust may be based on nothing more than my own prejudice – making it a difficult situation to correct.

Trust is a fickle thing for a company because it’s usually between two individuals. There can be trust between groups, but that trust is often contingent on each groups membership – as members change then so too does the trust level. Humans are also imperfect – they forget, they have bad days, and sometimes they just don’t show up. For this reason it’s more effective to automate any of these decision points you can.

When you automate a handoff you could be simply documenting a set of possible outcomes and the conditions for each – like a protocol for handling things. This allows more autonomy and flexibility in handling the handoff – if the documented expectations are met – a set of steps can be followed to complete an action. You could also be adding some technical automation by having computers make some of the decisions for you & maybe do some of the work (if they aren’t already).

When a development team implements continuous delivery – they are essentially automating the handoff to Operations by saying “We’ve tested & met these set of expectations – this software will work if you press this button and deploy it”. When a team implements Continuous Deployment they’re taking it one step further and saying “We are taking more responsibility for the software and largely eliminating the handoff – we’ll both watch for problems and work together to resolve”.

Automate³

Automation – “the use of largely automatic equipment in a system of manufacturing or other production process”

Automation takes that process you repeat over and over and allows some computer/mechanical processor to handle it. You can automate decision making (automated test suites/continuous integration) and you can automate actions (automated provisioning/deployment). I’m a super duper big fan of automation – it’s much of what people think I get paid to do. But the act of automating something you understand is not the hard part – or even really the interesting part to me anymore. The interesting part is understanding where you can apply automation, and then understanding how to tweak things here and there to make it possible to automate them.

An example of this is Feature Toggles. I can implement the automation to deploy the software a team builds on every commit. You push a commit and like magic – computers will whisk your code on out to production… where it will promptly provide a fine example of why “Continuous Deployment is BAD”. Outages, dying kittens, it all happens and then “Stop! We need a change control process!”. The issue isn’t that automating software deployment is hard, the issue is that making the software deployment automatable in a way that allows a high degree of success is hard.

Feature Toggles (Flags, Flippers, Switches, whatever) allow you to make changes to software while maintaining existing customer-facing behavior. This means that you can try some new thing in a Continuous Deployment environment in a way where customers should never be impacted. Yes, there are caveats – but it’s beside my point. The capability to automate the software deployment is contingent on a capability of the software, not the deployment automation.

There’s a similar dependency on automated testing, not because test quality is required – it’s required either way – but because testing must be automated in order for Continuous Deployment to even be possible. If I have to wait for Jimmy to finish his testing before my commit can go out to production – then by definition that’s not “Continuous Deployment”.

So automation is important – but not always in the way people think. Folks think they’ll just hire a DevOp to DevOp the heck out of this problem and automate it all away – but unless you also have a dev team that has built software capable of being deployed in this way, it’s not gonna happen.

Humane Software systems

This is a point I bring up during the discussion and I wanted to touch on it. If you haven’t, you should go and watch Jeff Hackert’s talk (below) on building humane systems. I see this failure so often and it’s truly avoidable if we treat our automation systems like we treat our products – building them with an understanding that our customers are humans, and they have feelings & experiences that differ from ours, and getting their feedback is good. I can’t add much to the talk – watch it.

Version & Share binaries, environments and processes

This is perhaps more obvious to some than others – this is table stakes for building a modern development workflow. Very often the “environments” and “processes” pieces are harder than others – especially when different groups handle the production environment vs. the development environment. If you can make them the same – do it – but I have yet to be somewhere that can do this 100%.

My preference is to focus on the production environment. This is more meaningful in a SaaS environment than perhaps someone building desktop software, but I think it’s legitimate to consider either way. The effort involved in completely replicating your production environment – the traffic, the processes, the systems, the network, everything – is… so. much. work. I would argue that in many (most?) cases you can put that same degree of effort (or less) into making it possible to evaluate software changes in production. Further, by doing so you take advantage of all the organic variations which are experienced in a production environment.

Much has been written about doing this – for your googling pleasure take a look at the idea of a “dark launch”. The basic premise is exposing a system to production traffic in a way which doesn’t put customers at risk. The process is too involved to discuss here – but it’s a way to leverage the existing environment and traffic you have to evaluate changes in a way that provides better realism than any test environment I’ve seen while still allowing experimentation.

Evaluating changes in a production environment means you are using the same process, the same systems, the same network & ideally the same traffic patterns to evaluate the results of change. And as those things evolve, so too does your testing environment.

Blameless post-mortems

This is a big one for me. I prefer retrospectives, not everyone does, but I like to look back on the good and bad to learn from the past on a regular basis. There are times when an event was clearly bad, but there are usually positive things that happened that you can acknowledge and enforce during a retrospective.

That said, the “blameless” part is super important. If you want to bother to look back on what went wrong and what didn’t then it’s worthwhile to invest in making sure everyone is honest and open. This is hard because different people have different tolerance for criticism – so we minimize it. I’ve written in the past about this as have others – I’d suggest further reading if you aren’t familiar with the idea.

This impacts velocity by identifying areas where improvements can be made & identifying where there is friction today. If you aren’t looking at the past then it’s much more difficult to improve the future.

Wrapping up

I care about these topics a lot and I always look forward to chatting with others about their different experiences. If you want to talk about any of this – just give me a shout.

Operation Bootstrap

Web Operations, Culture, Security & Startups.

C9D9: Supporting Development Velocity Downstream

My $.02 starting point

Culture and Team Structure

Value Stream mapping

Optimize Inter-team handoffs

Automate³

Humane Software systems

Version & Share binaries, environments and processes

Blameless post-mortems

Wrapping up

My $.02 starting point

Culture and Team Structure

Value Stream mapping

Optimize Inter-team handoffs

Automate3

Humane Software systems

Version & Share binaries, environments and processes

Blameless post-mortems

Wrapping up

Automate³