A story of code at Decathlon

Part 1 : From chaos to clarity

Merlin Julien
Decathlon Digital
6 min readJan 9, 2024

--

Co-authors : Laetitia Diverchy, Aurélien Kempiak, Romain Lajeunesse, Elliot Roig

Legacy Context

Migration from Imperva to Cloudflare in few months

In 2022, Decathlon decided to migrate their WAF solution for all their websites from Imperva to Cloudflare. The main focus was to automate the process to the maximum. Unfortunately, the delay to build the solution and migrate all websites was short (6 months) — and we are talking about 1500 sites managed by different business units — so we had to be really efficient.

To give the context, the infrastructure consists of a terraform project to manage AWS serverless resources.

  • DynamoDB stores site configuration and users for all cloudflare accounts (see them as terraform state)
  • EventBridge, StepFunctions and SQS are used to execute code, by group of XXX websites/users.
  • Lambdas runs python code for creating/maintaining websites (called manage_site) and users.

When we started, manage_site was doing the basics : managing cloudflare zone, few HTTP/S settings, DNS, certificate, firewall and managed rules. It was a simple 300 lines of code script stored in a single file.

But month after month, we needed to add specific configurations, add features, add defaults rules. Cloudflare Universal SSL API endpoint depends on Let’s encrypt (including all its limitations), therefore we add a lot of issues, leading us to add a lot of error handling and verbosity.

Run team quickly needed some additional basic scripts to do tasks like manual website deletions or modify user membership to their support group. As we did not have time to build an architecture, a lot of code was duplicated within a few modules.. To give an example, here is the same function (initiate connection to Cloudflare) in 3 differents scripts :

Legend : 50 shades of code.

At its peak, the single managed_site script reached 1300 lines…

Factory integration

In a few word, 3S (for Self Service Stack) is a Decathlon factory that creates and manages all resources you need for an application, including external technical stack, without contacting directly and separately each OPS team responsible for each technical stack (storage, kubernetes, monitoring, security…).

In the middle of the migration project, a new feature was added to the team backlog: our automation process must be callable through the API management in order to be able to provision Cloudflare website from the Decathlon 3S solution.

This means to add or integrate our solution on :

  • AWS API Gateway (to expose lambdas)
  • API Management (Gravitee) and all its security requirements
  • Documentation (swagger which establishes the contract between us and our API consumers)
  • Input validation (users will inject wrong data so we need to control this)
  • Security constraints : Exposing our code means security concerns, leading to more code, more verification flows

Human background

Most of us are DevOps, but with a more Ops background, and less Dev experiences, especially on advanced subjects like :

  • expose your code (API)
  • serious data validation
  • unit tests
  • test and merge code produced by 5 peoples at the same time (pileup party)
  • CI/CD with github actions

Our skill, knowledge and organisation were limited and needed improvement.

Organisation

What do we need to improve?

Originally our code linked to our Cloudflare service API was splitted in two major files (we can say service).

  • First is our api_launcher.py file, it’s used to make a first check to the client input and allow us to make a quick response if the client doesn’t file some mandatory keys/values.
  • Second is our manage_site.py file. This one is used to make all of the request changes, create a zone, or set up the SSL configuration. It regroups our real workflow to manage a cloudflare website.

There are some other files used to manage our users rights, some scripts, but they aren’t our API engine, they weren’t our priority.

api_launcher is the entrypoint of our API, the API gateway uses it when a request comes. It made what it was built for and after that it sent the client payload to manage_site.

So api_launcher makes the decisive pass to manage_site, this is teamwork!

For all of that we inspire ourselves with the “Clean Architecture” design, the “SOLID” design too and the Refacto Guru (https://refactoring.guru)

Work batch division

The first goal that we wanted to achieve was the modularization of our code. In fact, every fix was a pain to manage as we were afraid of breakage because there was too much adhesion between our different functions.

One modularization was done, we tackle the validation part. We implemented a reliable input validation with the Base Model from Pydantic library. This is a big contract versus the original validation we had setup on the client side. . With this new system we were able to handle errors more efficiently and give more detail to our consumer.

One of our major achievements was to implement UnitTest to test all of our new code. This kind of development was new for all of us. We knew that it would take time but we were sure that it would bring more quality and stability to our code for the future features.

Last but not the least, we wanted to implement a json formatted logging. The purpose was mainly to normalize our logger system in every Python code execution, but behind this normalization we wanted to make our logs more readable, and easily processable by our observability tool

In summary :

  • Deduplication to lower our code complexity
  • Modularization that provide the capacity (later) to implement unit tests
  • Data Validation will help to simplify & homogenize
  • Test Improvement to secure merged PR during the refacto process
  • Add a logging library for debugging

Timeline

Sharing responsibilities

To understand each technologies behind our goals, we decided to split our research into 3 parts:

  • The Teacher
    Pydantic
    and his BaseModel class need further reading to understand the complexity. One of us has taken the point and raised the team’s knowledge on this subject.
  • The Lumberjack
    The modularization forced us to re-think our code structure. One of us has dived into all code structures and designed our future. After that he presented it and we voted for it or another.
  • The Librarian
    The logging library that we wanted to implement with a specific format needs to dive into the logging system too. We already have a little structure to make it before beginning but we wanted to design it to be used by our entire Business Unit if needed.

The UnitTest goal implies that we were all working on it. One of our teammates helped us with his knowledge so we were able to start working on this step by step…

This is the end of this first part of this journey, next we will present the technical leverage.
Thanks for reading!
🙏🏼
👏🏻👏🏻👏🏻 Give a few claps and “
follow” if you enjoyed this series.

💌 Follow our latest posts on Twitter and LinkedIn and discover our sport 🚀

--

--