Reducing API Latency: Our Journey Toward Improved Performance

Gopal Gupta
motive-eng
Published in
12 min readJun 14, 2023

In a system where every millisecond counts, scaling an application can present significant challenges. We harness the power of vehicle telematics data from vehicle gateways and AI dashcams to generate comprehensive time series data and consolidated reports for fleet managers. As our data rapidly expanded over the years, we identified an opportunity to optimize our system performance and enhance the reliability of our product.

We set out to identify areas causing issues and to improve their performance by reducing latency and API error rates. To address these reliability concerns, we formed a dedicated team tasked to improve Motive’s dashboard load times and overall system performance. In this article, I share our journey of optimizing APIs.

Identifying Obstacles

We use Datadog to monitor API traces, metrics, and errors. We also use it to create dashboards and monitors, our go-to resource for identifying APIs with high latency or error rates. We determined which API endpoints required optimization using these dashboards and defined our target metrics.

Next, we pursued a proactive strategy by isolating the components that could benefit from optimization. This required us to go through the code in detail. We analyzed the business logic for a comprehensive view that positioned us to consider potential performance improvements.

Understanding Latency

Several metrics can be used to measure the speed of data transfers in a network, including throughput and latency. These metrics can help identify performance issues in a network.

Latency measures the time for a packet to be transferred across a network, either as a one-way trip to its destination or a round trip. Conversely, throughput measures the quantity of data sent and received within a specific time period.

For data persistence, we rely on PostgreSQL, which employs a caching mechanism to store recently accessed records in the database cache. In high throughput systems with numerous requests for the same or similar data, PostgreSQL leverages its caching mechanism to serve responses directly from the database cache.

To make sure a high majority of the requests are served within the target, we set our baseline to p95 and p99 latencies (a p95 latency refers to a 95th latency percentile, where it’s acceptable for 5% of requests to be slower than the target; likewise, in a p99 latency, 1% of requests can be slower than the p99 target). Moreover, we enhanced the UI loading experience for large fleets with significant amounts of data by improving the API latencies of the p95 and higher percentiles. It’s important to note that while DB cache hits provide faster results for certain requests, they may not accurately represent the p95 latency.

Optimization Strategy

Optimizing an API is crucial for businesses to compete in today’s performance-driven environment. Companies of all sizes and industries are refining their APIs to provide better experiences for various stakeholders. The following sections discuss several techniques we used to improve our system’s performance. The methods below are ordered by their importance and latency impact on our system. Depending on your system requirements, some scenarios may be ordered differently or omitted.

Database Partitioning

PostgreSQL performs better by processing small chunks of data (objects or partitions) than by processing a single, large table. As our business expanded over the past few years, we accumulated a significant volume and variety of data. To address this, we partitioned our data based on a monthly time range and implemented time-bound queries to achieve faster results. Partitioning significantly reduced database disk reads and improved our data write speed.

Database Indexing

Database indexing is a technique that can speed up data retrieval in situations where full table scans from disk are slow. Proper indexing can significantly improve the performance of database queries, especially in large tables or tables with complex queries. Before creating an index, it’s essential to understand the data in your database and how it will be queried. Analyze the queries used most often and identify the columns used in the WHERE, JOIN, and ORDER BY clauses. During this process, we had to make several decisions, which included these considerations:

  • The rule of thumb is not to create too many indexes on a single table; they can cause overhead and potentially slow down data writes.
  • Creating an index on boolean fields is generally not recommended unless the data is highly skewed. If you need to retrieve data from a skewed set only, using a partial index on the relevant columns is better.
  • A multi-column index is generally more efficient than multiple single-column indexes, especially when searching on various multiple columns simultaneously. While multicolumn indexes in databases provide benefits in terms of query performance, they also have some downsides to consider:
    — Increased index size: Multicolumn indexes store data for multiple columns together, leading to larger index sizes than single-column indexes. This larger index size can impact storage requirements and lead to increased memory consumption.
    — Increased index maintenance overhead: As data in the indexed columns are modified, the database needs to update the corresponding entries in the multicolumn index. This additional index maintenance overhead can impact the performance of write operations, such as insert, update, and delete queries.
    — Index selectivity limitations: Multicolumn indexes are most effective when the indexed columns have high selectivity, meaning they have distinct and diverse values. If the indexed columns have low selectivity, where many rows share the same values, the benefits of the multicolumn index may diminish.
    Note: It’s crucial to analyze the specific requirements of your application and consider the trade-offs before deciding to use multicolumn indexes.
  • The first column in the index key should be the one that is used most often in the WHERE clause and which has more distinct values. There’s no benefit from indexes with more than three columns unless the index columns are all retrieved by the SELECT statement. Called index-only scans, these scans without subsequent table access can be much faster than regular index scans, avoiding the overhead of recovering data from the table itself.
  • Creating an index on a large table can be time-consuming and may even lock the table for writing during the index creation. To avoid these issues, we created indexes concurrently using CREATE INDEX CONCURRENTLY. We followed a two-step process to create an index on an already partitioned table. First, we made the index on each partition individually, and then started the index on the entire table.

Precomputing Data

Precomputation can speed up API response times. In our platform, we used precomputation for our dashboards that display static data and summaries of events over weekly periods. Instead of running complex queries over the entire dataset each time an API request is made, we now precompute the data weekly and store it in a separate table. This lets us quickly retrieve the aggregated records when an API request is made.

Precomputation improved our API response times by 80%, enhancing the overall user experience and increasing user satisfaction.

Caching the API Response

We used the front-end technique of caching API responses for subsequent requests to optimize the performance and reduce unnecessary API requests for relatively static data. This technique is particularly beneficial in scenarios where users frequently interact with the same data or navigate between different tabs or pages within the application. We leveraged the Service Worker technique to cache this data with a TTL based on use cases. (Read more about caching here.)

Service Worker is a technique used to create effective offline experiences and is a step toward turning an application into a Progressive Web App (PWA). With this approach, we reduced the number of API hits to the server, thereby decreasing server load and improving the user’s loading experience.

Image credit: https://developer.chrome.com/docs/workbox/caching-strategies-overview/
/*
Each API will have a separate configuration to compute the expiry time of the response. Each response holds the expiry time that says when the response should be invalidated.
*/
const apiConfig = getAPIConfig(request.url);

if (apiConfig) {
const response = caches.match(request).then((cachedResponse: Response) => {

// returns cachedResponse if it is present in the Cache and not expired.
if (isValidResponse(cachedResponse)) return cachedResponse;

// else, fetch the response and store it in the Browser Cache storage with the expiry time.
return fetchNewResponse(request, apiConfig);
});
event.respondWith(response);
}

Concurrent Processing

Concurrent processing can be beneficial for tasks that involve a large amount of data or computation. We noticed that some of our aggregate endpoints fetched data from various sources like databases and internal microservice APIs. Because these operations involve a lot of I/O work, performing them asynchronously can significantly reduce the overall latency. By concurrently running these operations, we reduced the time to retrieve data and enhanced the system’s performance by 70%.

 var wg sync.WaitGroup

// prepare all channels to be used to send/receive data
// writeOnlyChannels - to perform only write operation by various producers
// -- producers are the ones who read from DB and write to the appropriate channels
// readOnlyChannels - to perform only read operation by http_response.
readOnlyChannels, writeOnlyChannels := GetChannels()

// acrChan to be used to send/receive the final transformed data
acrChan := make(chan http_response.ResponseWithError)

// the following will listen to all readOnlyChannels for any data
// notice: not adding 1 to wg, because we want this go routine to live until
// all the channels are closed - we will signal this once all the producers have finished its work
go http_response.NewResponseObject(logger, ctx, readOnlyChannels, acrChan)

// adding 1 to wg and calling Method1 as a go routine
// this will send data through Method1Channel
wg.Add(1)
go Method1(writeOnlyChannels.Method1Channel, &wg, var ...Type)

// Same with other methods
wg.Add(1)
go Method2(writeOnlyChannels.Method2Channel, &wg, var ...Type)
wg.Add(1)
go Method3(writeOnlyChannels.Method2Channel, &wg, var ...Type)


// waiting until all the aforementioned go routines call wg.Done() except http_response.NewResponseObject
wg.Wait()

// once all the producers quit, closing the channels
// this is a signal for the consumer to stop listening and send the transformed
// data through acrChan
CloseChannels(writeOnlyChannels)

// listening for data coming through acrChan; producer is http_response.NewResponseObject
acr := <-acrChan

// once data is received, closing the acrChan too
close(acrChan)

Breakdown API Endpoints

The breakdown of our API endpoints was a low-hanging fruit. We were using four endpoints to load our dashboard. Most modern browsers can typically handle six concurrent HTTP requests per domain (here’s a great article about that). So we analyzed the functionality of each endpoint and broke it down into smaller, more specific functions. It reduced each API’s overall latency and complexity, making it easier to manage, debug, and maintain.

We are now calling nine endpoints, out of which the top six slower requests are invoked in parallel, and the three remaining faster requests are queued or stalled temporarily. This approach ensures that the page loads faster by prioritizing the high-latency requests while allowing the faster requests to be processed as soon as there are available slots. Here’s an example to illustrate this breakdown:

Optimized Pagination Strategy

Another improvement was a solution to remove an inefficient pagination query that included a total count of records and the number of pages in each API request. Instead, we modified the front-end application to request the data continuously until the received records were less than the requested page size, indicating that no subsequent pages remained.

This approach eliminated the need for a time-consuming separate count query, especially for large data ranges. As a result, we improved the overall system’s performance and reduced the processing time for queries.

This example illustrates how changing our pagination approach helped reduce the spikes in latency starting on “Wed 15”.

Increase Timeouts

The main API hosting service interacts with various other services to fetch data, and we use timeouts to avoid request delays. This helps prevent the client application from becoming unresponsive or hanging due to long-running or unresponsive requests. We had an error retry mechanism that made three attempts to call the microservice before responding to the front-end. However, this approach wasn’t efficient, especially for large fleet requests with a broader time range. The disk read operation was taking a significant amount of time, leading to frequent timeouts of queries and retrying the same request.

We decided to double the timeout value to fetch the data in the initial microservice call while retaining the retry mechanism. This meant that the system allowed more time for the microservice to respond before considering it a timeout. By doing so, the system avoided the additional retries and associated latency. As a result, the API error rate decreased, and the p95+ latencies were reduced by 40%, improving the overall load time.

But increasing timeouts can’t always be the solution because it may lead to starvation of future requests (a scenario where the server’s thread pool becomes overwhelmed or busy serving existing clients and new clients are kept waiting). Lower timeouts allow the client to fail fast if the server is overburdened. Before increasing the timeout value, it’s crucial to strike a balance between allowing enough time for processes to complete and minimizing user wait time. Here are some key factors we considered:

  • Load on the server: Assess the current server load, including CPU, memory, and network utilization. If the server is already heavily loaded, increasing the timeout may further strain its resources and result in degraded performance.
  • Response time distribution: Analyze the response time distribution of the requests. Identify the p90, p95, or p99 latencies to understand the tail-end latency distribution. Increasing the timeout should be based on these metrics to ensure that the majority of requests can be serviced within the specified time frame.
  • Error rate: Evaluate the rate of errors or failures encountered by the requests. Increasing the timeout may reduce the error rate by allowing more time for the server to process requests successfully. However, it’s important to have a balance, as excessively long timeouts can also lead to prolonged error states.

Index Scan Over Sequential Scan

When designing and optimizing queries, index scans are generally preferred to sequential scans wherever possible. This is because an index scan can typically retrieve the required rows much faster, especially when dealing with large datasets. These are the configuration parameters we used to make sure our database engine prefers index scans over sequential scans:

  • cpu_index_tuple_cost
    This parameter is used by the query planner to estimate the cost of index scans. A higher value for cpu_index_tuple_cost indicates that index scans are more expensive than sequential scans, and vice versa.
  • cpu_operator_cost
    This parameter is used by the query planner to estimate the total cost of a query and choose the optimal execution plan. A higher cpu_operator_cost value will cause the planner to prefer plans with fewer operator executions, and vice versa.
  • cpu_tuple_cost
    This parameter is used by the query planner to estimate the overall cost of executing a query and determine the most efficient query plan. Higher value of cpu_tuple_cost can result in the query planner favoring plans that involve fewer tuple processing operations, such as index scans over sequential scans, to reduce the overall cost of executing the query.
  • random_page_cost
    This represents the cost of a non-sequential (random) disk page access compared to a sequential disk page access. A lower value for random_page_cost indicates that random disk accesses are relatively cheaper.
  • seq_page_cost
    This represents the cost of a sequential disk page access compared to a random disk page access. A lower value for seq_page_cost indicates that sequential disk accesses are relatively cheaper.

It’s important to note that not all queries can benefit from an index scan, and the decision to use an index or a sequential scan should be based on various factors, such as the size of the table, the selectivity of the query, and the availability of appropriate indexes.

Summary

Improving web service performance is a complex task requiring a comprehensive strategy to address performance issues in different system areas. It involves analyzing factors such as latencies, trade-offs, architecture, and more to identify performance bottlenecks. The techniques mentioned in this discussion, including optimizing database queries, reducing API calls, and implementing caching, can be a good starting point. However, tailoring the approach to your specific system and requirements is essential. Effective performance optimization requires a continuous improvement mindset and an ongoing effort to monitor and optimize the system’s performance.

Acknowledgments

We sincerely appreciate Chandra Rathina, Arvind Ramachandran, Khawar Baig, Muhammad Bilal and Sainandan Tummalapalli, who provided invaluable assistance throughout the project. A big round of applause to Balasubramani Mani from the Frontend team. A special shout-out to the exceptional work of the QA team, comprising Anam Yasmeen and Farjad Ali Khan. Lastly, we thank the server team, comprising Veer Ram, Filipe Martinho and myself: Gopal Gupta.

Come Join Us!

We’re a team of diverse individuals with a common goal: to build innovative solutions that impact people’s lives. Explore our open positions and be part of something great.

--

--

Gopal Gupta
motive-eng

Writing my way through the complexities of life, one story at a time.