A Queuerious case of High Receive Queue and groovy

Gaurav Kumar
Airtel Digital
Published in
3 min readMar 3, 2023

At Airtel we build technology to help 300+ million consumers to connect with the world and manage their services with ease. A large part of the manage space is handled by the self serve Airtel-Thanks app channel which ships on both Android and iOS.

The vast backend ecosystem needed to help power the Apps and the ambitious velocity to ship new and improved experience to users is a challenge that most large organizations face. We solve this with the BFF pattern and we have a custom implementation of this pattern called Guardian.

Guardian was introduced into the ecosystem around 3 years back with the motive of increasing velocity of new APP experiences we are able to ship and to decouple the Presentation layer from the backends at Airtel. Shipping a new APP is a time/resource intensive process from both the developer’s and consumer’s perspective, BEFE (herein now referred to as Guardian) helps as keep the presentation logic configurable at a backend and also ensures the core microservices do not need to adapt to give responses which are catered to the consumer apps.

How it works:

BEFE is a server driven UI architecture and works like below:

HLD:

Scale: 1 billion API calls a day and sees a peek of 8000 TPS.

The recent challenges: With the rising adoption in groovy, We started to see an increase in response time in this layer. First level analysis showed that the increase was in certain nodes rather than all nodes and second level analysis showed that certain threads were in deadlock.

What caused the problem:

Groovy has its own runtime to allow you use meta programming and many other features that dynamic languages provide. Due to this whenever certain lines of code are executed they cause a ClassLoader call to create a new class. Sometimes an edge case can be hit such that the ClassLoader hits a deadlock while multiple threads are loading classes that may be dependent on each other. This causes a crash and the Groovy run times halts, Causing high Response times and full receive queues. A restart will put system back in order. The behaviour though rare was enough to break our 99.99% garuntees and response time latency standards.

Short Term Fix : Synchronize and run all groovy scripts in a synchronized fashion. Groovy scripts are ideally small piece of code with very fast turn around time, hence moving these to synchroinzed we did have a preceivable skipe in the 99 percentiles and the deadlock condition was avoided. However this was not the best way to solve this and the engineer’s itch remained “How to solve it once and for all”.

Proper Fix: In comes @CompileStatic. Groovy at its core can be run by both the groovy runtime and the JVM. If you use @CompileStatic on your groovy class the groovy compiler generates JVM bytecode and executes the scripts as any java would be executed. You loose the dynamic programming capabilities, (which we were not using) and you gain speed and deadlock resistance from the Groovy Runtime.

Special Thanks: Pagespace team at Airtel, including Kamal Sharma, Nitin Gupta, Ambika Chaudhary and Himanshu Luthra for continous support and camaraderie.

References:

fastthread.io/

https://docs.groovy-lang.org/next/html/gapi/groovy/transform/CompileStatic.html

--

--