Settings at Scale in Outreach: How We Ensure that Settings Are Fast to Read

Martin Havelka
Outreach Prague
Published in
4 min readJul 31, 2023

In the fast-paced world of microservices, any delay in retrieving settings can serve as a performance bottleneck, disrupting services and undermining the user experience. In our previous blog post we discussed the key actors and components involved in our settings management microservice architecture. Here, we’ll explore the strategies Outreach has adopted to ensure that our settings are not only quickly accessible but also efficiently managed.

Screenshot from DataDog
Screenshot from DataDog

Utilizing Reader/Writer Database Instances

To manage rapid read operations, we’ve implemented the segregation of our databases into reader and writer instances using the AWS Aurora PostgreSQL DB engine. This partitioning is crucial for load balancing as it allows us to distribute the load of incoming requests evenly. By segregating our read operations from our write operations, we ensure that our write database isn’t overwhelmed with read requests, which could otherwise lead to delays in writing new data. This strategy helps us maintain high performance while avoiding contention and ensuring data integrity.

Implementing Server-Side Caching with Redis

Caching is a technique that stores copies of frequently accessed data and serves it upon request, effectively reducing the load on the database and speeding up data retrieval.

For settings, we leverage Redis, an open-source, in-memory data structure store, as a caching layer to manage read operations effectively. By storing a copy of frequently accessed settings, we can deliver them faster upon request. It’s crucial for us to maintain tight control over our caching process, especially overall write operations, to properly invalidate the cache when necessary.

Optimizing Code to Avoid Expensive Operations

Code optimization is an essential step toward ensuring quick access to settings. We’ve meticulously examined our codebase to identify and eliminate expensive operations that could impact performance. This process involves efficient memory management, minimizing network calls, and using data structures that speed up operations.

For instance, we aimed to avoid unnecessary unmarshalling of JSON structures, a process that can be computationally expensive. By pinpointing areas where this could be minimized or eliminated, we’ve significantly increased our system’s overall efficiency.

Another example involves optimizing the reading of LaunchDarkly feature flags to avoid network calls to third-party tools. Whenever possible, we strive to avoid using feature flag checks on the read path.

Lastly, it’s worth mentioning that we’ve ensured specific features, such as data validation on each read, are done asynchronously and do not block the request response.

Simplifying Database Models and Precise Database Indexing

A simple and well-structured database is easier to navigate, ultimately speeding up read operations. We’ve designed our database models to be as simple and intuitive as possible, ensuring that they efficiently represent the data we’re storing with the AWS Aurora DB engine.

Furthermore, we’ve implemented precise database indexing. Indexing is a technique that significantly speeds up the data retrieval process. By maintaining a specific order of data, we can find records more quickly, thus reducing the time it takes to fetch settings.

Providing Multiple Integration Options: gRPC/GraphQL and Replica/Kafka

To optimize performance and consistency, we offer different ways of integrating with our Settings service.

For synchronous integration, we use gRPC and GraphQL. gRPC is a high-performance, open-source framework developed by Google that uses Protocol Buffers as its interface definition language. It allows our microservices to communicate with each other directly and efficiently. On the other hand, GraphQL is a data query language that enables declarative data fetching, where a client can specify exactly what data it needs from the server.

For asynchronous integration, we use Kafka and data replication. Kafka is a distributed streaming platform that allows us to handle real-time data feeds with low latency. By creating replicas of our data, we can distribute the load across multiple servers, ensuring data remains available even if one server fails.

With these multiple integration options, clients can choose the best method to optimize performance and consistency.

Image made with Midjourney 5.1
Image made with Midjourney 5.1

Conclusion

Ensuring fast access to settings is a challenging task, but at Outreach, we’ve employed a variety of techniques to overcome this challenge. By leveraging AWS Aurora, Redis, gRPC, GraphQL, and Kafka, along with implementing strategic code optimization and database design, we ensure that settings data is always quickly and readily available. As we continue to scale, these strategies will evolve and adapt, ensuring our services remain fast, efficient, and reliable.

--

--