OkCredit
Published in

OkCredit

How OkCredit Android App boosted Network Performance by 30%

Fast and reliable network communication is crucial for OkCredit mobile apps. Since the majority of our users belong to Tier-2/3 cities in India, flaky and unreliable network connection is one of the biggest challenges for us. In order to deliver a good user experience, our mobile apps need to have reliable and low-latency network connectivity.

While most resources we found describe how we can improve the overall time (start-end) of a network call, this article attempts to take a deep dive and talk about each step and how we can instrument and optimise each step of a network call in android apps.

Steps involved in a network call

Fig. Steps of a Network call. Ref — https://square.github.io/okhttp/features/events

1. Call start

Called as soon as a call is enqueued or executed by a client. Ideally, this step should not consume any time unless there are any custom Interceptors added.

2. DNS

Involves the DNS Resolver translating the domain name into the corresponding identifier (the IP address).

3. Connection Start

Tries to acquire a secure connection (TLS handshake) between the client and the server.

4. Connection End / Acquired

Invoked after a connection has been acquired for the call. After this, the communication of request and response payloads is started.

5. Request / Response / Headers

This is the step in which the actual communication of data happens. The HTTP method, size of request / response payload, server latency / response time, etc. can greatly impact the time taken by this step.

6. Connection released

Invoked after a connection has been released for the call.

7. Call end

Invoked immediately after a call has completely ended.

Pooled connection

OkHttp has the capability to pool connections with which it can skip various steps of a network call mentioned above, which in turn can greatly improve the application’s network performance, as existing connections can be reused.

Fig. Steps of a Network call with pooled connection

Instrumentation of each step

To determine the particular step(s) having a scope of improvement, we can instrument each step to know exactly how much time it is taking.

We can make use of the EventListener class for listening to metrics for each step. EventListener is an abstract class for metrics events. We can monitor the quantity, size and duration of HTTP calls. It is recommended that the metric callback methods should be lightweight, execute fast and perform any I/O operations asynchronously.

By overriding various callback methods, we can calculate the metrics for each step of a network call efficiently. Some of the important methods that can be overridden are :

Some important callback methods to be overridden for metrics

Note — there are a few more callbacks that can be used for detailed metric analysis.

Instrumenting on Production

There can be various factors affecting network performance on production. Instrumenting in a controlled environment might not give the actual metrics, hence, rather than testing network performance using local tools, it might be a good idea to enable instrumentation on a sampled set of production users.

This can give metrics across various types of environments, such as :

  • Network operators / Wifi
  • Geographical regions
  • Device manufacturers
  • Application state (Background/Foreground)
  • Android OS version, etc.

Calculation of network performance metrics

Time taken by different events, eg. connection start -> connection end, request body start -> request body end, etc., can be calculated by checking the difference between the start and end callback of that event. All start, connect, and acquire events will eventually receive a matching end, release event, either successful or failed.

An example of the same is shown below :

class NetworkCallEventListener : EventListener() {
private var callStartMillis: Long

private fun printEvent(eventName: String) {
val currentTimestamp = System.currentTimeMillis()
if (eventName == "callStart") {
callStartMillis = currentTimestamp
}
val elapsedMillis = currentTimestamp - callStartMillis
println("$elapsedMillis ms : $eventName")
}

override fun callStart(call: Call) {
printEvent("callStart");
}

override fun callEnd(call: Call) {
printEvent("callEnd");
}

}

The gist mentioned below can be used for logging network performance metrics :

https://gist.github.com/okshrey/fbe6349b888eba4c36e73745f9c5d2e0

Usage

Create an instance of NetworkCallEventListener.kt in eventListenerFactory inside OkHttpClient.Builder(), as shown below —

OkHttpClient
.Builder()
.apply {
eventListenerFactory {
NetworkCallEventListener(analyticsProvider)
}
callTimeout(...)
...
}

Areas of improvement we identified

- Use a single instance of OkHttp throughout the app

In a multi-module android application, a common mistake is to create multiple OkHttp instances for each module. Doing so can be detrimental to the network performance, since each OkHttp instance will have a separate connection pool, and will not be able to take advantage of a pooled connection present at another instance.

- Fine-tune OkHttp Connection Pool

class ConnectionPool constructor(
maxIdleConnections: Int,
keepAliveDuration: Long,
timeUnit: TimeUnit
)

Optimal values for keepAliveDuration and maxIdleConnections can substantially improve network performance, as existing connections can be reused. The value for keepAliveDuration should ideally be the same at the client and server for optimal results. Use OkHttp Connection Pool with caution, having too high values for maxIdleConnections, keepAlive might end up in wastage of resources.

Since the connection step occurs at the client, its instrumentation cannot be done on the server. Hence, instrumenting and optimising it at the client is crucial.

- Avoid I/O operations in Interceptors

There are a lot of operations apps might want to perform in network interceptors. While having lightweight operations is fine, any kind of I/O operation should be avoided. A common practice that might lead to poor network performance, is attaching Authentication Tokens (retrieved using an I/O operation) to network requests.

- Avoid having multiple API hosts

When using microservice architecture, often different microservices can have different API host names, which will in turn require the app to call APIs at different hosts for each use case. Using multiple hosts in the app might result in the DNS step taking a longer time. To optimise this, it is recommended to have a single API host.

- Auto-retry call on API failure

To avoid non-idempotent API failures, such as timeout, server error, etc, a recommended trick is to add an Interceptor with automatic retries on API failures. This can be further improved by adding special handling for specific types of failures and adding appropriate delays in between retries.

- Use QUIC

QUIC is a​ UDP-Based Multiplexed and Secure Transport. Using QUIC can optimise the connection acquisition step. Few apps like Uber, YouTube have been using QUIC and have seen improved network performance. Unfortunately, till the time of writing this article, OkHttp has not added support for QUIC. An alternative that can be used is Cronet.

- Adjusting DNS TTLs

DNS TTL (Time To Live) is a setting that tells the DNS resolver how long to cache a query before requesting a new one. The information received is cached in the recursive or local resolver for the time specified in the TTL before it reaches out for new / updated details. Fine-tune this value so that the frequency for querying is minimised and updates are quickly propagated.

Other potential action items —

Overall Impact

Before and after comparison of average time taken by network calls

After implementing some of the action items mentioned before, we were able to see an overall improvement of about 30% in the average time taken by network calls, with a major improvement of 70% in the Connection Start to Connection Acquired step.

Most engineers consider network performance as something that can only be optimised at the backend, however, there are plenty of optimisations that can be made at the client, that can critically impact the network performance.

Credits to the entire Android team at OkCredit for making the OkCredit app better everyday — Nishant Shah, Rashanjyot Singh, Mohitesh, Harshitfit, Pratham Arora, Manas Yadav, Saket Dandawate, Anjal Saneen.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store