2021 retrospective and looking forward to 2022

Published in

Inside Doctrine

5 min readJan 17, 2022

A year ago I wrote this previous article highlighting both the process we followed to get Doctrine on top of its technical challenges as well as the list of key technical subjects we decided to focus on in 2021 and going forward.

In 2021, we made huge progress on that list of 19 technical challenges. We tackled 9 of them, even if there might be leftovers here and there and we made great progress on the data processing subject. The rest of it, that is 9 of them being more or less left untouched as the result of prioritization. This is in line with our objectives as we intended to deal with a bit more than half the subjects.

I will spare you the detailed list of things we worked on or not, but I will still highlight a few major achievements:

we surveyed various OCRs and moved all our OCR-related data processing to leverage a single internal Python library we built to extract text. We will now be able to plug that library into our single OCR of choice;
we now have a reliable A/B testing framework as presented by William Duplenne in his own article;
we reconciled our customer-centric and SEO-centric applications leveraging Next.js as explained by Samuel Martineau in this article;
we set up a set of testing practices, in general, and both for our JavaScript and Python stacks;
we revisited our search engine workflow to make it more flexible and easier to maintain;
our Elasticsearch indexing stack is now consistent and entirely available in Node.js;
we adopted Terraform for infrastructure as code, we can now very easily onboard new developers and add new services. The A/B testing service was actually one of the first to leverage that effort;
we defined a new backend architecture, inspired by Domain Driven Design principles and clean architecture, leveraging NestJS and started moving our existing code to it while leveraging it for new features;
in the data processing pipeline, we are now able to fully scale both from a hardware and software perspective, we are able to monitor our scripts leveraging Datadog even though the road is still long before having fully standardized our practices there.

I really hope the team will come back to you on this very blog to share more details about those projects just as William and Samuel did.

But enough about 2021, let’s talk about 2022 now.

Before starting the new year, we followed a pretty similar process to last year, gathering a coordination group representing all our squads and chapters to revisit the priorities for next year.

Aïmen during a brainstorming session (photo by Bertrand Chardon)

From that process, most of the subjects raised in 2021 and that we did not work on were again on the top of the list. However, we also got a new subject around the Machine Learning lifecycle that was not as clearly stated in last year's list even though some of the items around the data feedback loop were clearly related. The reason behind the urge for this item was mostly our 2021 experience where we had to re-train a lot of models in the context of a Python migration and the pains that it incurred.

For 2022, we decided to focus on 3 different but related areas, and worked on them in the form of task forces, mixing contributors from different squads. They all fall under our 2022 tagline that is “Standardize, Guide, Unlock & Empower through Frameworks”.

These three task forces are as follows. For each of them, we defined a set of high-level user stories we want the task force to work on.

Data Task Force

This task force covers both data processing and data storage. This is a continuation and reinforcement in particular around data storage of last year's effort.

Story #1

As a data/ML engineer when I need to add a new data processing to the pipeline I can just follow very clear guidelines, utilities and rely on standard, ideally open source, processing technologies to produce my code such that it is monitored, scalable, and can run continuously. I can easily test that code. When my tasks fail I can easily debug and find intermediary data.

Story #2

As VP of Engineering in the team, I have a high level but still detailed enough to be a useful picture of what our pipeline is doing. I know what are the pain points in the pipeline and where they are located so that I can make informed decisions on prioritizing our effort.

Story #3

As a data/ML engineer when I need to store data during data processing I know where and how to store it depending on my use case. I can easily explore the data from the previous process steps and add processing on top of it. The envisioned practices are scalable by allowing parallel processing of the data and virtually unlimited data storage in a cost-effective way.

Story #4

As a back/full-stack engineer I know where and how the data engineers have stored data. I can easily explore that data and build application use-cases on top of it. I know where and how to store data for application use-cases. The envisioned practices produce the best performances for our end users.

Machine Learning Lifecycle Task Force

Story #1

As an ML engineer, I can easily follow the evolution of my metrics and parameters while training my model as well as in production to detect shifts in predictions.

Story #2

As an ML engineer once I need to train a model again, I can easily start back with the parameters and dataset I used, while still having access to the metrics of my previous training sessions in a standard manner.

Events, Logs, and Metrics Task Force

Story #1

As a product squad engineer, I know how where, what, and when to send different types of user/technical information.

Story #2

As a DevOps engineer, I have a limited and well-defined set of tools to maintain for gathering that information.

Story #3

As a software engineer, I can easily unlock new product opportunities by relying on well-identified and formatted events.

What’s next?

Focusing on those 3 task forces does not mean we will not make progress on the other subjects, but they will be dealt with as part of our 20% time reserved for sustained engineering work or as part of the various squad objectives.

To work on those subjects as well as continue developing features for our legal intelligence platform, we are opening a dozen of engineering positions in 2022. So don’t be shy, and if you want to be part of building the first legal intelligence platform by bringing your skills to the table scroll down on our recruitment page and submit your application!

2021 retrospective and looking forward to 2022

Data Task Force

Machine Learning Lifecycle Task Force

Events, Logs, and Metrics Task Force

What’s next?

Written by Christophe Jolif