Fear and Loathing in CI/CD. On psychological safety and progressive delivery. – James Governor's Monkchips

https://redmonk.com/jgovernor/2022/03/22/fear-and-loathing-in-ci-cd-on-psychological-safety-and-progressive-delivery/

Dan Hon’s newsletter is a must read. Always packaged with smart stuff, well-written, and funny. One of his recent pieces – But What Is Certain Technology Infrastructure Anyway? – struck me, so I wanted to quote it at length here.

Some people understand what CI/CD means, and some people (myself included) might reference the fact that modern software companies deploy software to the web hundreds or maybe sometimes thousands of times a day. This freaks people the fuck out. They are used to dealing with planned releases, say, once a quarter, and even then (say it quietly) maybe they don’t even meet the deadlines for that planned release. The release to UAT or acceptance testing or whatever for someone to just click around and say “oh, this looks fine”, which is nothing like “actually use the thing and try to do the thing in something approaching real world conditions”.

So really, the deal with something like continuous deployment isn’t that suddenly someone (a consultant, or a trendy vendor) is coming in saying, hey, now you can push updates to prods, tiny updates, many times per day! Even once per day! Because they may freak the fuck out and assume that now you actually want to push lots of those updates, many times a day, when if you stopped and you thought about it, it’s totally reasonable to be freaked out by that. They (expansively, the organization in general) is barely past graduating from “here’s a three ring binder for you to learn how to use $business_system” to “here’s a word document and PDF for you to learn” to “here’s a video, updated maybe every other planned release”. And now they think you want to push changes to prod multiple times a day? What kind of change? What kind of change would possibly be worth pushing, that’s that small?

There is clearly a disconnect here. Pushing that much, that frequently, is, like, a long term goal and you there in baby steps.

The second thing that normally happens with the whole “we should have infrastructure that supports this” is that someone will say something like “I don’t want duplication” and then I say something like “no, but I don’t like forcing people to use something that at best provides no appreciable benefit, and at worst makes things worse, fixes nothing, and burns a bunch of goodwill and political capital”. And then people think about that a bit.

This whole thing just seems wise and on the money about the massive delta between one set of expectations and another. And of course it’s more complicated than that. Even if the fabled “IT org” gets it, and is aspirational about delivering software more quickly, with more, smaller, changes, it’s still going to freak “the business” out. The same “the business” that complains IT is holding it back and moving too slowly, is at the same time going to be like what are you doing, of course you shouldn’t be making changes constantly, “you’ll break stuff if you do that…. we don’t really trust you to do that.”

We talk about psychological safety in modern software delivery. Stuff is going to break. In production. Shit’s complex, yo. So we need to learn from failure in order to become more resilient, we need to have blameless post-mortems, so we’re able to experiment, and roll out new code, and make system changes, without being afraid of getting fired if something goes wrong. The truth is that folks in IT have broken a production system, quite often in an embarrassing way. Psychological safety and blameless culture aren’t just for “the IT org” though. “The business” will benefit from understanding that culture too [blameless culture originated in sectors notably healthcare and aviation]. Many small changes, less big risk, but even if things go south, we fix them and move on, having learned from the failure. It’s generally easier to roll back, or make a fix, for a small change.

So continuous delivery/continuous deployment can be hard for people to get their head round. Because after all these are questions of control. Who’s in charge of all of this? But CI/CD is also the basis of pretty much everything good in modern software delivery. On thing I found really interesting when I started talking about Progressive Delivery to “IT leaders” and “the business” was that it was apparently less scary that the idea of CD. The idea that you can decouple deploy from release, and test things with specific named cohorts, or certain amounts of application traffic, before rolling it our more broadly, with controls and automation in place for (hopefully) safe release, was really appealing to people. The idea you could have radical delegation, with control. Canarying, Blue/green deployments, A/B testing, Feature Flags are not new ideas, but we didn’t have the Certain Technology (see Dan again) in place to make them easier to adopt, roll out and maintain. The fact is with DevOps: Tools Can Lead The Culture Change. As Rachel Stephens my colleague says:

If you are trying to drive organizational transformation with procurement alone you’re in for disappointment. Tools cannot fix a broken culture. You can’t buy your way out of a culture issue. Tools can’t save you if you’re ignoring underlying issues like internal power struggles, lack of trust across teams, siloed communication, etc. There are no silver bullets.

But that said: tools can very much lead culture. For example, part of AutoDesk’s transformation journey was moving the team to GitHub; that tooling change was a huge underpinning to all the subsequent changes to collaboration and communication in the organization.

Tools can be critical to changing people’s mindset. It’s hard to practice the right behaviors without the right foundational toolset. Tools can enable new ways of working and collaborating.

In the end, this is not an either/or. Technology supports culture change, but technology alone is insufficient to drive a culture of shared ownership and accountability. Tools are not magic, but they can be a tangible pivot point around which the organization can transform.

So today I can adopt tools like Launch Darkly or Split.io for feature management, there is a wealth of Observability tooling (Dynatrace, DataDog, Honeycomb, Lightrun, Lightstep, etc) out there that allow for testing in production and dark launches, built for purpose automation tools for building pipelines (CircleCI, CloudBees, GitHub Actions, Harness, Keptn, Spinnaker, Weaveworks Flux etc), enabling Progressive Delivery. It is indeed a golden age for infrastructure tooling.

Fear is natural, but with the right narrative frames and platforms and cultural approaches, we can get over it, and start delivering software more effectively and quickly.

This was not commissioned research. RedMonk clients mentioned above though include CircleCI, Dynatrace, GitHub, Harness, Honeycomb, Lightrun, Launch Darkly and Weaveworks.

No Comments

Save my name, email, and website in this browser for the next time I comment.