How EDF Built a DevOps Practice on AWS

Steve Bowerman
EDF Data and Tech
Published in
8 min readJun 3, 2024
Photo by Growtika on Unsplash

Adopting DevOps is widely considered as the ‘right’ way of working when building and supporting modern cloud based solutions.

Whilst the DevOps enabling technologies are mature and plentiful, and there are an abundance of engineers that work (and want to work) in a DevOps way, there are many barriers to entry, particularly for larger organisations with heritage processes and technologies.

I’ll step you through EDF’s journey over the last 3–4 years.

Why DevOps?

Recognising the need to be more agile, to innovate, and to do-more-with-less in a highly competitive market, we set-out with a hypothesis that we’d attain this by:

  • Forming ourselves around product teams and agile methodologies to provide greater end-to-end ownership and accountability.
  • Adopting DevOps as a delivery and operate model that builds on the product ownership mindset, that removes hand-offs between build vs run, and that promotes a single team — single goals culture.
  • Building an internal capability of engineers to own and adapt the IP and enable better use of partners (capacity and skills gaps) rather than traditional outsourcing.
  • Leveraging our existing AWS usage, but shifting towards higher levels of automation and use of serverless architectures.

Stepping Stones

High level journey

Rome wasn’t built in day, and nor can your DevOps transformation. Whilst our journey was (and is) a continuous evolution over time, we’ve chunked it up into 3x discrete phases of maturity:

1. Learn by Doing, and prove that we can do this!

Building upon the need to act and to work out how we can operate in a DevOps mode, we ran an initial incubation project.

The goal was to form a product team, deliver customer facing product and to run that product as a DevOps / product that acted as the North Star for how we wanted to work:

  • Product team
  • Embedded engineers
  • Agile roles like Product Owner rather than traditional PM
  • AWS serverless architecture that used high levels of automation
  • Full ownership of product lifecycle including design, build, deploy, operate.

As we lacked a lot of the core skills (but have plenty of delivery and AWS experience) we hired in some DevOps experts to work with the team. It should also be noted that the product team was formed of individuals wanting to help make this work — and hence good messengers and evangelists for success.

Key outcomes

  • We succeeded in getting a product to market in < 6 months and learned a huge amount in the process. Most importantly we learned that this was achievable and that the DevOps way of working contributed to the team cohesion and fast delivery
  • New AWS account topology pattern (known internally a Landing Zone — not to be confused with the AWS product) ensured that new AWS accounts were vended for this team to operate in — one set of accounts per team. This gave is isolation and accountability
  • The use of serverless first architecture (rather than E2C, K8 etc) served as a blueprint for our serverless first approach that we have since adopted. We’d done some serverless previously, but not with IaC (Serverless Framework and Terraform)
  • High levels of automation; CI/CD based on github, CodeBuild and IaC with Terraform and Serverless framework. Also the shift-left of testing into the pipelines.

2. Walking the Walk — Doing it ourselves!

Having proved the value of DevOps, we had a mandate to proceed and form another 4–5 product teams to deliver new products.

At this point we had a very basic blueprint which we needed to make repeatable and to start building up skills so that our DevOps partner could exit — this was a key success criteria for them.

We can break down this stage into 3 areas:

  • Lighthouse Projects — Selecting suitable green field products to deliver and to form new product teams around so that the outcome was a product in market and an enduring product team that owned that product and used DevOps/product engineering to do so.
  • Talent Pipeline — Starting to build a talent pipeline; forming job definitions, landing the culture we wanted and starting the treadmill of recruitment. Later in this phase we introduced our Data & Tech graduate scheme.
  • Account Vending Machine — Taking the Landing Zone account topology that was incubated previously and turning it into a self service vending machine (Terraform + gitops) to help teams onboard quicker.

Key outcomes:

  • 4x product teams formed and products live in market — mixture of internal staff facing and external customer facing services
  • Data & Tech Graduate scheme started.
  • The AWS Account vending machine that enables accounts to be created / bootstrapped in minutes
  • Recruitment of 8 internal software engineers and up-skilling of 5 existing partner engineers.
  • Formation of a central Cloud Services team that owns the AWS Account vending machine.
  • DevOps as a way of working is normalised.

3. Scale Up

This stage happened around 2022 where it had been just over 2 years since we’d started the incubation project. We’d seen product teams grow to 10 and the DevOps way of working becoming the default approach.

Whilst we’d scaled-up from zero to around 50 engineers / 10 teams, it was not quite enough to enable us fully for the typical levels of change and delivery that we encounter, plus we were aware of a number of larger programmes of change that DevOps would need to support.

This phase focused on:

  • Recruitment Scale Up — We learned alot recruiting. Its a treadmill, its hard and costly in terms of time and effort. We needed to find a more enduring / scalable approach.
  • Partner Evolution — We always realised from day 1 that partners play a key role in our talent sourcing, but shifting from a traditional ‘outsource the work’ mindset, to a ‘bring in talent when needed’ — typically to support burst capacity and skills gaps. All the time ensuring that the focus was on the enduring product team size and capability.
  • SRE & Operations — By now, we’d have products that have been live and in the wild for around 1–2 years, and had gained alot of operational insight from using out of the box AWS tooling.

Key outcomes:

  • Creation of Data & Tech brand and in-house headhunting / recruitment approach that uses paid campaigns rather than expensive agency placement fees. We found this not only more cost effective, but a more human centric approach as applicants where talking to EDF from the outset, rather than via a proxy. Ultimately, people want to work for people. By over-indexing on this process made recruitment more effective and reduced the friction. This approach enabled us to double our engineering workforce in a 18 months.
  • Partners now aligned to our Competency Framework as a common language for roles, skill levels and rates. Regular ‘resourcing’ (dont really like that term!) conversations to ensure we maintain the right balance of internal skills vs partner.
  • SRE Community of Practise formed, and the SRE culture / mindset adopted into our competencies and ways of working. Essentially ‘design for operation’. We also went to market to select a more centralized observability platform — Dynatrace which enables better dependancy visibiity between teams and components, and faster triage / fault isolation.

Building Blocks

Photo by Mourizal Zativa on Unsplash

Reflecting on this journey, we observed the following key building blocks that we’d created along the way. That might help you on your journey:

  • Account Vending — If you are using a cloud platform like AWS, leverage the value of separate accounts for isolation and ownership. Utilise a vending machine (Landing Zone or Control Tower, or roll your own — as we did, as these didnt exist at the time). Adopting that vending machine / self service / gitops mindset for regular change is essential to reducing friction and delay.
  • Governance as Code — Typically the larger the organisation, the larger the governance overhead is. We found that baking in standard, guardrails and governance into pipelines, template etc was more effective that traditional eye-balling. Lots of good tools our there to do this like Checkov, AWS Guard Duty etc. Plus leveraging security vulnerability tools like Dependabot, AWS Inspector gives that blended coverage of static vs dynamic code analysis.
  • Recruitment as a Service — We needed to balance centralizing recruitment for consistency and brand purposes vs federating for practical purposes — ie a team should be recruiting its people, not someone else! — our learning from recruitment has essentially formed this ‘service’, its iterated through feedback, owned and directed centrally by the Principal Software Engineer.
  • Communities — Leveraging communities of practice and interest groups to decompose and drive forward a range of engineering enablers like; SRE, Security, Front-End frameworks, Integration Patterns is a great engagement tool to empower engineers to contribute to the bigger picture and have greater ownership on the technology direction.

Lessons Learned

Photo by Tim Mossholder on Unsplash

1. Vision rather than Plan

Form a vision of what DevOps looks like and anchor that as your goal, rather than making static plans. The vision will evolve over time, but core tenets will remain consistent — these are your ‘why are we doing this’ answers.

The AWS ‘working backwards’ is often a great way to approach this.

2. Start Small and validate/iterate

  • You will get stuff wrong, but equally you won’t know its wrong until you try!
  • Don’t try to move too fast. Steady, demonstrable value and growth builds reputation and greater buy-in from detractors
  • Iteration and continuous improvement is a fundamental behaviour, set that into the DNA and ways of working from day 1.

3. Culture is key

Defining what culture you want to cultivate is essential to success. Ours is formed on DevOps normals like:

  • Contribution & Collaboration — Break down silos and increase engagement by active Communities of Practice, Game Days and a sense of belonging to a collective that is more powerful than the individual teams
  • Recognise and award exemplar behaviours around
  • Ensure that culture is embedded in the recruitment, metrics, job definitions, competency frameworks etc — anything that you use to select and retain the right talent.
  • Leverage the value of evangelists and messengers to adopt, promote and extol the virtues of DevOps. Top down mandates are less inclusive and instantly provoke reaction. However, seeing demonstrable value and benefit promotes more ‘we should be doing that’ rather than ‘you must do this’.

4. Building and retaining talent is hard

  • Treat recruitment as regular exercise rather than a fad diet. Invest in making it efficient, transparent and fair so that its as low effort as possible to keep switched on.
  • People want to work with people (mainly), so your culture and how you operate needs to be visible and showcased. Reputation breeds reputation

Further Reading

Photo by Gaelle Marcel on Unsplash

--

--

Steve Bowerman
EDF Data and Tech

Thinker, Software engineer, Architect, Senior technology leader, maker of stuff and creator of sound and vision