10 Software Engineering Lessons from Transforming our Appointment Search

Published in

Doctolib

23 min readJun 20, 2024

If you look back over the past few months, was there ever a time where you needed a medical appointment quickly?

Not quickly as in “I need to go to the ER now” quickly, more like the “I should have this checked out soon” type of problem.

This kind of situation happens often. Early availability is one of the most significant factors when choosing a medical appointment.

Now picture this: the next time you need a doctor appointment, you take out your smartphone and effortlessly search through the right doctors based on who’s available the soonest. Imagine the impact — drastically reduced wait times and quicker, better care.

Over the past months I was lucky to lead an ambitious journey to bring this vision to life at Doctolib. We set out to build a system that sifts through countless doctors to match patients with the quickest path to care, all while ensuring fast response times (like sorting hundreds of doctors with complex setups by earliest availability in less than 200ms).

From a software engineering perspective, it was a fascinating project.

In this post I’ve picked 10 learnings and takeaways from various phases of the projects: study of feasibility, POC, algorithmic optimizations, design, implementation, caching strategies, testing, deployment and monitoring.

This article will show you:

How to get a general understanding of a piece of legacy code
How to get a thorough understanding of a piece of legacy code
A simple trick to define a scope and actionable steps for a POC
How to constrain an algorithm to precompute it efficiently
Multiple caching strategies to optimize for performance and accuracy
How to (and not to) debug caching issues
How high cohesion speeds up delivery
How low coupling speeds up delivery
How to avoid circular dependencies
Multiple release strategies to go to production successfully

This selection of examples shows how software best practices can be applied to concrete situations.

We’ll go through specific examples of things that worked well or less well for us. Things that saved us time and things that slowed us down. We’ll draw generic techniques and principles from these lessons to apply to other software projects.

Study of feasibility

It all started from a vision: a patient should be able to search for a doctor on Doctolib, and see the results sorted by earliest availability.

From a technical perspective, we weren’t sure how to achieve it, or even whether it was possible at all.

Indeed, before starting this project, our system could compute available slots on the fly when we needed to display them for a patient. The time to compute the available slots of one doctor ranges from several dozens to several hundreds of milliseconds.

If you’re wondering why it takes that long to compute a list of slots, imagine that there are a number of things to fetch from the database (opening hours, exceptional absences, many existing appointments, many settings, etc.). Sometimes doctors need a piece of equipment that has its own agenda, and sometimes they also need an assistant (with their own agenda of course) and everyone needs to be available at the same time. Add some constraints in a doctor’s day (some types of appointments can only be performed in the morning, or no more than a certain number of times in the day), then take into account the insurance of the patient for which the doctor dedicates specific opening times, and you’ve scratched the surface of the availability computation for medical appointments!

Now how many doctors should we compute available slots for, when a patient makes a search? It depends on the city and speciality, and it ranges from a handful to thousands of healthcare practitioners in the same location and speciality.

One option is the brute force solution, to compute the available slots of all the relevant doctors on the fly, in response to the patient search. But this option is ruled out quickly.

Indeed, let’s do the maths: for 1000 relevant doctors, and assuming it takes 100ms on average to compute available slots, it would take 10 seconds to compute the available slots of all doctors and sort them. And for 2000 doctors, and in a more pessimistic scenario of say 200ms on average, that would make 40 seconds.

Not the ideal user experience.

An incremental improvement wouldn’t do it. Even a 50% improvement wouldn’t do it. What we needed was a couple of hundred milliseconds tops for hundreds of doctors.

We needed a 99% improvement. We needed a paradigm shift.

We needed to have all the slots computed in advance in order to serve them very quickly to the patient when they search for an appointment.

But was it possible? And how?

The Proof Of Concept

Before setting out on such an ambitious quest, we first set out on a Proof Of Concept (POC) to determine if we could ever reach the ballpark we were aiming for.

Doctolib’s management appointed Rony (staff engineer), Martin (senior Product Manager), and myself (senior staff engineer) to explore the topic and come back with an architecture proposal along with response time estimates. Our mission was to determine whether doctors could be sorted by availability in a timely manner, and if so, what design would drive us there.

The first step was to understand the existing system we would build on.

Understanding the existing system

Lesson #1: To get a good understanding of code, start from product questions

The code to compute available slots, answering to the name of “availability service”, was the most complex piece of code of our system. It contained years of features, bug fixes and performance tuning, and could have used more documentation.

Let’s take a look at the technique we used to understand this code with no previous knowledge, as it is not specific to medical appointments and can be applied to understand challenging code in general.

The technique consists in applying the following steps:

Identify something simple in the product
Step through the code
Find the line of code that corresponds to what you saw in the product
Repeat the above steps with more and more elaborate aspects in the product

Let’s take an example.

Doctolib allows healthcare practitioners to set up opening hours in their calendars to indicate when they’re available for patients to book appointments with them. They are the big white rectangles showing in the calendar:

These opening hours are the first basic block to determine available slots. The first question we can ask ourselves is “How does the code find the openings that were configured for the calendar?”.

To find the answer, we fired up the debugger, launched the availability computation from the application, and stepped through the code. We followed names that look like “openings”, “fetch”, “load” or “retrieve”. We ignored anything that doesn’t look related.

It’s ok to miss the location the first time. If the place doesn’t show on the first pass, we can start over as many times as necessary. Each pass on the code reveals more things.

When finding the exact location, the interesting piece of information is the call stack. Call stacks are instructive. They show the path from the entry point all the way down to the database access. Call stacks are useful to write down for future reference.

Then we continued with more questions:

Where is the list of available slots created?
Where is it filled?

Each time, we identified the line of code, or the function, that answers each of these questions.

Then on to more elaborate use cases:

How do we make sure not to offer slots where the doctor has created an absence?
Or when another patient has already booked an appointment?

At this point we were much more acquainted with the code than at the beginning. We could move on to more complex questions:

It’s 9.21am now. I can see for all days that slots start at 00, 15, 30 or 45 of the hour. Except for the slots this morning, which start at 25, 40, 55 and 10. How come?
When the opening accepts a visit reason of 12 minutes, I see slots every 12 minutes. But when I add another reason of 15 minutes, I can see slots every 15 minutes even when searching for the reason of 12 minutes. Why?

The questions become more and more elaborate, and so does the understanding of the product and the code. The technique is to repeatedly build a use case in the application, go on an exploration with the debugger, and note down the question and the call stack of the location identified.

The code begins as a vast fog of war, but every exploration that yields a call stack is a slash cutting a path through the darkness.

Building the use cases also makes one more familiar and comfortable with the product, which will also make things faster down the road.

Going the last 20%

Lesson #2: To get a thorough understanding of code, use a spreadsheet and gamification

The above process reveals the code of the features and use cases we could think of. But it doesn’t cover the most advanced cases that we don’t know existed.

To understand everything in the code, we had to be more systematic. We made a spreadsheet with all the functions that constitute the code to explore.

The technique consists, for each function listed in the spreadsheet, to read the corresponding code and classify the function into a category. For example, categories in the domain of medical appointments availability can look like: insurance, overbooking, shared equipment, filling rate, booking delays, etc.

The most basic code may not fall into one category. It’s the main use case, that computes slots where no advanced features are used. For this you can have a global category called “main”. Most of the code discovered with the debugger as described in the previous sections likely falls into the “main” category.

This is a long process, and it typically follows the 80–20 rule: it takes 20% of the time to discover the first 80% of the code, and 80% of the time to figure out the remaining 20%.

Gamification

To make the process of turning every stone more rewarding, consider adding a global counter in the spreadsheet to display the percentage of functions that you’ve understood so far. Also, adding the number of lines of each function allows us to compute the number of lines understood overall. This gives the exploration a sense of accomplishment.

Here is what the spreadsheet could look like:

Learning more about understanding legacy code

I’ve written more on understanding code in my book The Legacy Code Programmer’s Toolbox.

Defining a scope for the POC

The purpose of a POC is to prove that something is possible. This implies crafting at least a rough design of a solution.

This is a tricky task: where to start with? And on the other hand: when to stop and call the concept proven or rejected?

A POC incurs a risk of implementing too much code that will be thrown away and thus wasting time, and conversely a risk of implementing not enough and starting a real project on shaky assumptions.

The purpose of our POC was to show we could precompute available slots ahead of time, keep them up to date in real time, and provide them to patients quickly when they browse the website.

That makes a list of topics to explore:

Precomputing slots
Keeping precomputed slots up to date
Serving precomputed slots to patients

However, this high level list is not easy to work with. Where to start?

Formulating with questions

Lesson #3: To set up a roadmap for your POC, frame its goals as questions

Our CTO, Alex Kaluzny, gave us a seemingly simple piece of advice that propelled the POC from an abstract list into actionable implementation: don’t list items, list questions.

The goal of the POC becomes answering the questions: if each question gets a satisfying answer, it means the project can be accomplished.

When converting topics into questions, each Item turned out not to have a direct translation into one question. Rather, each one expanded into its own list of questions.

For example, here is how the first item, “precomputing slots”, translated into concrete questions:

How much time does it take to compute and store availabilities for the next month, 3 months, 6 months, 1 year for one doctor’s calendar?
How would the resources consumed by precomputing data compare to the resources currently consumed by the search?
How expensive is it to precompute the data on all the relevant calendars?
How many calendars do we need to precompute the availabilities on?
Would the availabilities computed from precomputed data identical to the availabilities currently computed on the fly?

Some of these questions are not trivial to answer. Some need data analysis. Some need prototyping. Some need analysis with the Product Manager.

But they constitute a concrete list of steps to go through the POC. It took us several weeks to answer them, and in the end we were confident that we should launch the actual project.

The next takeaways we’ll look at are in the design and implementation of the solution. In order to make the specific examples fit in the global picture, here is a brief overview of the architecture behind the feature.

Providing earliest availabilities

To be able to serve results promptly, we partially precompute the available slots of practitioners in advance and store them.

Here are the three main components of the system:

1️ API to get the first available slots

To be able to sort by earliest availability, all that is needed is the first available slot for a doctor on a particular visit reason.

The purpose of the system is to provide an internal API taking a list of doctors and visit reasons, and returning the first available slot for each of them. The (internal) client of this API would use this information to sort and display doctors in the right order.

2️ Storing availabilities

To provide the first available slots, the API relies on precomputed availabilities, stored in a database. We’ll get back to what these precomputed availabilities look like in the section on algorithmic optimization, called “Intermediate representations of availabilities”.

These intermediate representations are generated on two occasions. The first one is an automatic job running every night, that trashes and regenerates all the intermediate representations (and cleans the ones of the past, as patients only look up for appointments in the future).

The second one is in…

3️ Reaction to events

All day, millions of patients and healthcare professionals book or modify medical appointments on Doctolib, or update the configuration of their calendars. changes of opening hours, creation of new visit reasons, updates of various settings.

All this has an impact on availabilities. To make sure the intermediate representations keep in sync with the reality of the calendars, some event listeners pick up all the events triggered by relevant changes, and refresh the intermediate representations of the calendars on the days that could have been affected by any given change.

Algorithmic optimization

Now let’s dive into the details of the precomputed slots. What should they look like?

The first idea that comes to mind is to store the slots that will be presented to patients.

But this idea has several drawbacks. The first one is that any change in the slots computing logic invalidates the entire datastore — which is something that we want to happen as rarely as possible!

The second drawback comes from the access pattern of the use case we’re designing for: only a few visit reasons are needed when searching doctors of a certain speciality.

Choosing the right visit reasons belongs to the business logic specific to the patient features, that we don’t want the slots precomputation to be coupled with. In theory we could repeat the computation for all the slots of all the visit reasons, but in practice doctors have dozens (sometimes hundreds) of visit reasons, which would make this option inefficient.

Finally, slots in the past or too close to their start date expire. When a slot expires, if we don’t have any other data then we need to start the whole calculation from scratch to get fresh bookable slots again, which takes time.

Intermediate representation of slots

Lesson #4: To precompute an algorithm, break it down into a writing and reading part

One way to solve these issues is to run the slots computation, not all the way to the final result, but to stop at some point in the middle, and store the intermediate results.

Then when the patient needs the final results, we fetch the precomputed intermediate results, pick up the computation where we left it off, and provide the final result to the patient.

This design solves the above problems if it respects 3 constraints.

The first one is that the code generating intermediate representations (the Phase 1 on the above diagram) needs to be as stable as possible, so that the datastore is invalidated as rarely as possible. The code picking up the rest of the calculation in memory (Phase 2), on the other hand, is free change without invalidating any stored data.

To identify stable and unstable parts of the code:

commits history show which parts of the code change frequently
hard rules in the product tend to indicate stable behaviours (e.g. “we never offer slots where a doctor created an absence”).

The second constraint is that Phase 1 must not depend on the inputs passed by a given API call, for example the current time or the visit reason passed by the API. The result of the calculation in Phase 1 must be independent of the inputs of the exposed API. Put another way, all the dependencies to API inputs must be grouped in Phase 2.

The last one is that most of the execution time of the slots calculation code should be spent in Phase 1. This phase is performed asynchronously, in order to populate the intermediate representations. Phase 2, on the other hand, must be as fast as possible as the response time of the API depends on it.

Reshaping the code to break it up into these two phases respecting these constraints required significant refactoring.

Caching strategies

Lesson #5: Consider multiple caching strategies to optimize performance and accuracy

To improve the response time of the API, that directly affects the time a patient waits for the page to render, we’ve added a layer of caching.

Caching brings a mind blowing performance improvement. Without caching we would fetch data from a database and run computation in memory. Done right, caching allows instead to grab data that is already sitting in memory. The result is several orders of magnitude faster.

But this comes at a price. As the saying goes, there are two difficult things in programming: naming, cache invalidation and off-by-one errors (and C++ folks also add undefined behaviour to the two-items list).

As the saying emphasises, we need to figure out how to invalidate the cache, which means to determine that a value sitting in cache is no longer reflecting reality.

There are multiple strategies for invalidating cache, each one offering different tradeoffs. Here is a non-exhaustive list:

Fixed Time-To-Live (TTL): The value is automatically evicted from the cache after a certain time (e.g. 1 minute, 24 hours, etc.). It’s not accurate but very simple to implement.
Variable TTL: The value contains an expiration date, for example the latest moment for a given slot to be open for booking. It’s more accurate, but not always possible to implement. Using TTL only (fixed or variable) has the drawback that a bug fix in the value computation won’t take effect until the TTL reaches its expiration.
Event based (write-through): A listener picks up some events and replaces values in cache in reaction to the events. It’s much more accurate but can require a lot of work to identify 100% of relevant events.
Periodic refresh: At fixed intervals (e.g. every night or every weekend) a big job empties and recomputes all the cache. It’s a good complement to the event based strategy, to compensate for buggy or missed events.
Read-through: every cache miss triggers a recompute from the primary data store, returns the value and puts it in cache for subsequent requests. Contrary to write-through it doesn’t require to identify events, but it’s at the cost of response time.
Piggy-backing on existing traffic: if there is existing traffic computing slots, we can use its results to refresh the cache “for free”. It’s easy to implement, but can be inaccurate if there is irregular traffic. Add metrics to measure accuracy.

Using one strategy doesn’t prevent using the others. We used a combination of the above strategies to improve accuracy and response time of the API.

Debugging cache issues

Lesson #6: Invest in logs to investigate caching issues

Why is caching hard? Some of it also comes from the fact that it brings new kinds of problems: cache misses (we expected a value to be in cache but it isn’t) and incorrect cache values (the value is in cache but it’s different from the value we would have computed without cache).

These problems are harder to deal with than the average bug, because values in cache come and go. A cache miss can trigger a refill of the cache, so when looking at it a second time the value may now be present, making the miss not reproducible and leaving nothing to indicate why the value was missing in the first place.

To diagnose bugs related to cache, an (inefficient) approach is to try to guess what’s going on in the cache, and make a fix accordingly. This has proven to be ineffective and made some cache misses very long to diagnose.

A better approach is to add logging on what gets written, read, missed and cleared from the cache. Logs allow to trace back cache misses and check if the value had been in cache but was cleared, or if it was incorrectly read, or if it was never in the cache.

My best guess for why a particular cache miss was happening was that some queries must occur between the time the cache is cleared and the time it is refilled. But adding logs quickly revealed that these cache misses were due to incorrect expiration time relying on a variable TTL.

The lesson I draw is to refrain from making fixes based on hypotheses, and instead rely on observations to analyse cache issues. And to set up more observability if necessary.

High cohesion for fast changes

Lesson #7: Keep cohesion high to speed up delivery

Investing in software design makes for flexible code and fast changes. Let’s take the example of a core principle of software design: high cohesion.

Ensuring high cohesion in the modules implementing the API allowed a swift transition of the API from a v1 to a v2. Let’s have a look at the example in more detail.

High cohesion implies that the system is broken down into several pieces, each one being oriented towards a unique goal.

The API is implemented through a controller. Let’s call it FirstAvailableSlotsController.

One of our design guidelines is to avoid putting business logic inside of controllers. Controllers’ responsibility is to handle requests and responses, and delegate to other modules when it comes to the business logic.

The controller validates the API inputs (a list of calendars and visit reasons) and passes them to a module called FirstAvailableSlots. That module is in charge of providing the first available slot for each pair of calendar and reason.

Slots are either retrieved from the cache or recomputed based on intermediate representations. The role of FirstAvailableSlots is to encapsulate that logic. When it comes to caching details though (reading from the cache, checking the expiration dates, etc.), it delegates to FirstAvailableSlotsCache.

For the inputs that FirstAvailableSlotsCache didn’t return anything for, FirstAvailableSlots asks IntermediateRepresentation to provide the corresponding intermediate representations (which knows how to load them and format them), and passes them to AvailabilityService to compute the first available slots.

Then FirstAvailableSlots calls FirstAvailableSlotsCache again with these new slots in order to add them to the cache for next time.

A feature request for the API

The team using our API had a feature request. Initially they would pass a list of pairs: calendar, visit reason. Now they needed to pass several calendars for the same visit reason.

That changes the API in a way that is not backward compatible. We decided to create a v2 of the API, accepting the new format of inputs with a list of calendars instead of individual calendars.

Let’s see how this affects the design implementing the API.

For one thing the controller has to change, to accommodate a new API. This is a small change, as the controller merely passed the validated parameters to another module.

The bulk of the change is in FirstAvailableSlots, because it now needs to break down the lists of calendars into individual calendars to pass them to FirstAvailableSlotsCache, IntermediateRepresentation and AvailabilityService and put them back together.

Since v1 and v2 have to coexist during the transition of the API client, an easy solution is to create a new module FirstAvailableSlotsForMultipleCalendars dedicated to the new feature:

Thanks to high cohesion, none of the other components have to change. This made the implementation of the v2 of the API very fast in practice.

When the transition is done and v1 is no longer used, we can replace the code of FirstAvailableSlots with the one of FirstAvailableSlotsForMultipleCalendars and remove the latter.

Going further on software design

I’ve written (and painted) about software designs principles, patterns and how they are all connected in the World of Software Design.

Changing cache access

Lesson #8: Keep coupling low to speed up delivery

Another example of software design benefits is where low coupling made for swift changes, when switching to a different cache access low level API.

Rails offers several such low level APIs, among which Rails.cache and RedisCache for Redis. At some point in the project, we needed to switch from one to the other.

The only part of the code that knows which low level API is used is the CacheAccess module. It offers methods such as read, read_multi, clear, etc. that are used by FirstAvailableSlotsCache. The change was then very quick, as it consisted in changing the implementation of the few functions in CacheAccess.

If the access to the low level API was spread in the code using the cache, it would have been a longer and more risky change.

This idea of isolating the access to technical layers is also related to hexagonal architecture.

Avoiding circular dependencies

Lesson #9: Circular dependencies can be easy to avoid

An important principle of software architecture is to avoid circular dependencies. Circular dependencies create coupling between the components in the cycle, because each one depends on all the others.

Often, circular dependencies can be avoided by shuffling around the responsibilities in code. Let’s take an example to illustrate.

When an event happens that invalidates availabilities (for example, a doctor changing their configuration), a job runs to update the intermediate representations. When this is done, the values in cache need to be refreshed with the first available slots based on these new updated intermediate representations.

The module that regenerates the intermediate representations is IntermediateRepresentation. The module that knows how to compute first available slots is FirstAvailableSlots. And the cache is handled by FirstAvaialbleSlotsCache.

A potential design (that we’ll discard in a moment) is for IntermediateRepresentation to call FirstAvailableSlots after finishing the update, to notify it to refresh the cache:

Taken in isolation, this seems like a sensible design. But looking back to the previous design diagram, we can see that FirstAvailableSlots already depends on IntermediateRepresentation!

This is a circular dependency, with two elements in the cycle: FirstAvailableSlots and IntermediateRepresentation. We need to find a way to remove the circular dependency.

One way is to shift the call to the cache refresh to the job:

This removes the circular dependency.

Another way could have been to split FirstAvailableSlots into several modules, but it would have been much less convenient.

An interesting takeaway here is that even if systems with lots of circular dependencies are notoriously hard to work with, avoiding introducing circular dependencies in new code can be easy to do. It is a worthwhile investment.

Progressive release

Lesson #10: Consider multiple release strategies to go to production successfully

Once the system is developed, unit tested, manually tested, tuned for performance and ready to go to production comes the question of how to release it.

Even after a POC and manual testing, the project contains risks when the rubber hits the production road. For instance:

Will the response time be acceptable?
Will the results be correct?
Did we cover all the complex setups for availability computation?
Will the load on the servers be manageable?
Will the load on the database be manageable?
Will the load on the cache be manageable?
Won’t there be concurrency issues?
Won’t there be any other unexpected issue (or rather, what will be the other unexpected issues)?

To mitigate these risks, one way is to release incrementally and monitor. The goal is to try releasing the feature without having any negative impact on user experience.

Here are various strategies that we found useful to release incrementally:

Feature toggle for sub-features: For example when implementing the real-time refresh of calendars using shared resources, a feature toggled allowed to activate the feature when we decided to. A few errors started to pop up, and we turned it off immediately. After a bugfix for the errors, we let it back on for longer, and ultimately removed the code for the feature toggle itself, making the feature a permanent addition.
Shadow production: the new system runs in dual run with the old system, and only the results of the old system are displayed to the users. If the new system produces errors or incorrect results, they only appear in logs and monitoring.
Canary release: Once the features are stable, rolling out on a small fraction of users allows to observe the behaviour in production with low volumes. Incrementing the number of users gradually allows to monitor how the system (and the new users) react to the feature. In case of an issue, we can downscale during the time to fix.
Rollout by cluster: All users don’t have the same properties and behaviours, but they can be grouped into clusters sharing similarities, for example doctors in related medical specialities. Deploying a feature on a cluster allows to limit the types of issues to expect as users in the cluster have similar properties, and at the same time to release to a significant number of users at a time. It’s an interesting trade-off.
A/B testing: to measure if the new system effectively improves the user experience.
Global feature toggle to shut down the system of real-time refresh: This is a last resort solution.

Like for caching strategies you don’t have to commit to only one release strategy. We used a mix of all the above depending on the stage of the project.

The purpose of incremental steps is to monitor the effect of each step, in order to decide whether to move forward or to step back. Here are typical indicators to monitors:

Crashes: e.g. with Sentry
Performance: endpoint response time, job durations, code spans durations, etc.
Load: CPU and memory of servers, database readers and writers, cache readers and writers
A/B test results
The product itself: Manually experiment what the users are seeing to check if the results are correct and the load times feel snappy on typical use cases.

There is more to a software project

This project has been full of learnings and takeaways, and of concrete applications of software design best practices.

In this post we saw concrete examples illustrating the following 10 lessons:

To get a good understanding of code, start from product questions
Use a spreadsheet and gamification to get a thorough understanding of code
To set up a roadmap for your POC, frame its goals as questions
To precompute an algorithm, break it down into a writing and reading part
Consider multiple caching strategies to optimize performance and accuracy
Invest in logs to investigate caching issues
Keep cohesion high to speed up delivery
Keep coupling low to speed up delivery
Circular dependencies can be easy to avoid
Consider multiple release strategies to go to production successfully

There is so much more to a software project though. We could dive more into each of the above topics, and explore even more of them. We didn’t cover testing, performance tuning, the linter and when to fix or tolerate its warnings, where to start the refactoring of a complex piece of code, API design, the discussions with the Product Manager, availabilities coming from third-parties, how to write useful documentation, and more.

I hope that the aspects we touched upon and the concrete example will resonate with your own software projects, and that the principles we saw will help you improve your software.