How to refactor critical code with a heavy usage in production
In this article, you’ll learn how we refactor critical code used daily by millions of users.
Every month, 10 millions of people are planning their day to day health appointments using Doctolib. In order to allow our users to find the right practitioner and book an appointment with them, we have implemented a structure which we call “the booking funnel”. Up until now, we have continuously improved this funnel to handle multiple features (document sharing, patient referral or extra informations).
In the end, this booking funnel was becoming more and more complex to allow all these scenarios to unfold correctly for our millions of users everyday.
Until last year, most of this funnel was coded with a client side view in React, except for one step. For this specific view, we had an enormous piece of code handling every possible use case and redirecting the user to a templated html code developed in slim.
As time passed and the company grew, this view grew as well… in complexity. It became difficult to read and new joiners struggled to grasp this part of the codebase. We also had more and more colleagues experienced in React that were wondering why this view stayed this way.
In an effort to tackle this legacy code, separate the different use-cases into different components and improve the overall experience of our users, we decided to migrate this code into React.
And that’s when our problems started
We started refactoring the code little by little and we were happy with where it was going. But the more we worked on it, the more complex our first pull request was becoming. Several problems appeared:
- other teams were working on this view so the code kept changing
- we had to handle a lot of complex cases and it was difficult to tackle all of them at once
- since we have a huge test suite at Doctolib (around 10 000 tests at the time), changing even a little the UX of this funnel meant adapting a lot of these tests
- we didn’t want to impact our current users, so it had to be retro-compatible and work even without users refreshing their page
- in the end, the pull request was so big that our coworkers were getting very worried
It was time to find a strategy to ensure the migration would happen in a smoother way.
It is always a good idea to get a visual image of what your code will look like. It’s also a good way for your coworkers to understand what’s going on. Especially when your pull request gets too big too fast.
And that’s what we started with. It was quick, easy and allowed us to merge our pull request quicker.
Two different funnels
As our pull request kept growing, it became obvious we could not tackle everything at once. But at the same time, the features implemented in this view were so intertwined, we could not rewrite them all one by one.
We didn’t want to spend weeks, even months, re-coding the whole view without ever being able to test it on real users.
What we wanted was a first version of a common use case of the booking funnel, so that we could see if it worked in production before iterating on it.
We decided to start with the most common one: allowing someone to simply take an appointment for themselves or for a relative, and to redirect all the others cases to the old funnel.
All we had to do was introduce a completely separate view.
For example, if we didn’t implement the custom fields in this version yet, which means that all the people taking an appointment with a practitioner in need of additional information from the patient would see the old view. In the meantime, all the users taking an appointment for another practitioner who does not need these additional informations would see the new view.
This means that our old code would still be present for several months and would still work with all the features already developed at this time.
Feature switch your new version
At Doctolib, when we introduce new features, we often decide to hide it in production until we are sure it’s working properly.
We also have the possibility to show the new feature to a certain percentage of users.
- At first, we weren’t very confident so we selected several medical practitioners with only a few appointments taken per day and activated the new funnel for the users booking an appointment from their profile page.
- After being reassured by this first test, we activated the feature for 1% of all users, which was already a pretty good chunk of people (as a reminder, around 10 millions of people go through our funnel to take an appointment each month).
- Then, after fixing a few errors that popped up, we progressively activated it for everyone in a couple of days.
This system is also a good way to quickly hide a feature in case of a suspicious error. We would use it to deactivate the new version from time to time to work on some issues without worrying about blocking our users.
Keep an eye on potential errors
Since we have two completely different views, we can easily track our users with NewRelic to know the number of people using the new or the old version of the booking funnel.
Another example of how this kind of monitoring helped us is that we could easily spot bugs on this view and their severity. If too many users were affected by this bug all we had to do was to switch off the feature or revert the last iteration.
Have an easy way to know when to remove the old code
As soon as the percentage of people using our new view dropped to 0% (or so, with a margin of error), we could be sure the old code was not used anymore and remove it instantly.
This booking funnel refactoring has not been an easy journey and there were several surprises along the way. Maybe some of the steps we took could help you too:
- Have a concrete plan to rollout your refactoring
- Monitor the usage and the potential bugs produced by your new code
- And most important of all, communicate as much as possible on your progress with the people and teams working on the same scope
Did you like this article? Go subscribe to our tech newsletter for more!