Nudges, Devops and getting people to do the Right Thing

Marc Jones
4 min readApr 27, 2020

--

So what is a nudge and how can they help us get people to do the right thing? Let’s start with wikipedia for the obligatory definition.

Nudge is a concept in behavioral science, political theory and behavioral economics which proposes positive reinforcement and indirect suggestions as ways to influence the behavior and decision making of groups or individuals.

Nudges were popularised by Richard Thaler and Cass Sunstein’s book Nudge: Improving Decisions About Health, Wealth, and Happiness, but before you go away and read all three hundred pages, read this post. Nudges are such a simple and effective concept that by the end of this post you’ll be able to identify ways to apply Nudges to your own Devops challenges.

Two urinals side by side with small fly drawing
One of the most famous, and silly, examples of a nudge can be seen here in an Amsterdam airport in the 1990s. A small fly drawn onto the urinal with the goal of improving accuracy and avoiding mess.

So what have nudges got to do with devops? Well, devops is, in my interpretation at least, not just about tools and automation, but about individuals and change, and unlike computers, people can’t be automated to do things in a new way. If you have read the likes of Daniel Pink’s Drive then you’ll be familiar with the value of autonomy in the workplace, particularly in professions such as software engineering. Autonomy is an essential pillar of a motivated and happy workforce and therefore it should be a goal to avoid sacrificing autonomy on the arduous road to devops nirvana; tempting as it may be.

The gist of it is this: a nudge is anything that will make it easier for someone to make a better decision or perform a task to a higher standard without removing autonomy. This could come in many forms, for example setting a default value for a checkbox or creating checklists, dashboards, alerts, tools or libraries but critically they should not be mandated.

This will probably make more sense with a few examples so let’s take a closer look.

Whilst working for a company that ran a 24/7 trading platform we agreed that it would be the right thing for the software engineers to start managing their own deployments into production. In the past this had been performed by a separate ops team who had been running deployments in the early hours of the day which came with a whole bunch of complications (including the infamous “throw it over the wall” culture) we were keen to get away from.

Although by this point the microservices were zero-downtime deployable and deployments were fully-automated, no production deployment is truly risk-free, and we wanted the engineers to minimise that risk by avoiding deployments at particularly busy times. We could have opted to modify our deployment system to block out busy times but this would have been difficult and time-consuming to “codify” as it wasn’t necessarily predictable and there are lots of edge cases, for example rushing out a super critical production fix.

This is a great case for a nudge. We needed to find a way for engineers to have the autonomy to deploy when they wanted but at the same time help them provide them with the context and information to make good decisions.

The solution was simple in the end and took less than an hour to implement. A Grafana “Deployment” dashboard that, amongst other things, had a graph showing how busy we were right now, plotted on top of a rough and ready prediction for the rest (and prior part) of the day. This was a really simple but effective way for an engineer to see how busy we are relative to other parts of the working day. End result — the engineers kept autonomy around when deployments were done and we avoided having to implement a complex rule-based engine to decide when deployments should and shouldn’t happen.

Here’s another simple example that was effective and fast to implement. The company I was working for had a common framework used across many (hundreds of) services (yes, a real life “distributed monolith”). We’d added a fantastic new feature we wanted everyone to use but we also needed services, and therefore engineers, to opt-in with a minor code and config change.

We decided to use a nudge — a simple log line every time the service started up reminding them that they needed to make the change. We also included a link to a wiki page to explain the benefits of the change and why it was necessary. It worked great — no-one was forced into it and this little nudge every time they ran a service on their local machine drummed the message home and encouraged adoption without mandating it.

--

--