Using in-house Pinot to generate job rankings at scale on apna

Nirmaljeet Singh
apna-technology-blog
6 min readJul 26, 2022

--

By Nirmaljeet Singh

At apna, we empower the rising workforce of India by connecting them to life-changing opportunities through our job marketplace and community. Customer obsession is one of our core values, and that drives everyone at apna to work hard to deliver an experience that delights our candidates and employers on the app. In this blog, we will share how we customize ranking of jobs for millions of users.

The Problem Statement :

Candidates come to apna looking for a job opportunity through which they can earn their livelihood. To provide the relevant jobs to the users we implemented a data science model that suggests the jobs which the user wants to see. The main issue comes when we rank these jobs for the user as there are many jobs which might be suitable for a user. Identifying the jobs which user has seen several times and is not interested in is a critical input for us as part of our feed design.

The Problem of scale :

With the exponential growth of Apna’s active users (> 22M) and the active jobs (>200K) on the platform, discovering the right opportunities becomes a challenge. Serving a job feed at a scale that is personalized and relevant to users becomes a necessity.

What is Pinot?

Apache Pinot is a real-time distributed OLAP datastore, built to deliver scalable real-time analytics with low latency. It can ingest directly from streaming data sources — such as Apache Kafka and Amazon Kinesis — and make the events available for querying instantly. It can also ingest from batch data sources such as Hadoop HDFS, Amazon S3, Azure ADLS, and Google Cloud Storage.

At the heart of the system is a columnar store, with several smart indexing and pre-aggregation techniques for low latency. This makes Pinot the most perfect fit for user-facing realtime analytics

Pinot Architecture

Picture Reference — Pinot Official documentation

Our Approach:

The clickstream data from the users on our app flows through Kafka. Our Pinot controllers are configured to a table, and subscribed to Kafka topics ingesting millions of impressions each second. Whenever a user refreshes the feed or opens our app a request is sent to pinot-client by our job feed service making a gRPC call. Pinot client forwards the query along with the job id’s related to that user to our pinot broker. Pinot-broker runs the query on pinot-server which in turn sends a response to job feed service, and a new feed is created for the user. We are running pinot and pinot-client on self hosted Kubernetes clusters on GCP.

Custom values of components for our use case:

  1. Server — Server is used to store data in form of segments which are loaded in memory and queries are performed against it. Server is memory intensive and latency spikes can happen if proper resources are not given to server. Right now we have 3 servers running with cpu (min — 2, max — 4), memory (min — 16 gb, max — 32 gb). Heap is kept at 16 GB which should be ideally 50% of our memory.
  2. Broker — Brokers handle Pinot queries. They accept queries from clients and forward them to the right servers. They collect results back from the servers and consolidate them into a single response, to send back to the client. Our current broker count is 3 with cpu (min — 1, max — 4), memory (min — 2 gb, max — 4gb) and heap memory as min — 1gb and max — 2gb.
  3. Controller Controllers are responsible for ingesting from data source (kafka in our case). They will keep consuming the new events from kafka topic. At one time only 1 controller is functional but we keep them in odd number so that even if one pod fails other can re-elect and start consuming. The current controller count is 3 with cpu (min — 1, max — 2), memory (min — 2 gb, max — 4gb) and heap memory as min — 256mb and max — 1gb.

Challenges Faced

  1. High Latencies — In early stages of pinot setup we were observing high latencies on simple queries. This latency was close to 25ms for basic read queries which ideally should not have crossed 4–5ms.
    Resolution — We identified memory crunch in servers due to which queries were taking too long to respond. We increased the memory of pods to scale vertically and observed 4ms latency after that.
  2. Latency Pattern — We observed that latency was moving in a defined pattern from 10ms to 25ms in production, whereas it should be a straight line without much deflection.
    Resolution — We figured out that latency was moving in a range because server was not able to load complete data in the heap memory. We increased the overall memory and made heap memory 16 GB, 50% of overall memory given to the server.

Initial behaviour

Current behaviour

Number of segments and data retention — We were not sure how many segments should we create. We figured it had to be compatible with the data we wanted to store. We started with approximately 530 million records over 30 day period. Right now, we are at 15 days with 320 Million records.

How is Monitoring setup on Pinot?

We have two components to monitor actively i.e pinot-client (micro-service) and pinot-components which we have setup via helm.

  1. Pinot-client — It is running as a micro-service on staging and production clusters. It is mainly responsible for connecting job-feed with pinot-components.

2. Pinot-compnents — Pinot components are monitored via prometheus and grafana stack. Prometheus and grafana are deployed via helm.

Applications, and Impact:

Here are the various applications where we use pinot to generate fresh job feed

  1. Job feed: By using Pinot to show relevant jobs to the users we have seen a significant lift in the conversion rate on the platform. Users are now able to find the jobs they want faster, whereas employers are able to find relevant candidates for their openings.

2. Push notifications: We send millions of push notifications to our users daily across various campaigns. By using pinot we are able to identify the right jobs for our users and send relevant notifications to them.

The way forward:

We plan to do the following in our next steps:

  1. Apply to other use cases: We will evaluate our other use cases where we can leverage pinot to solve real-time issues pertaining to analytics and finding meaningful results on large dataset.
  2. Improving latency and cost optimisation: Going ahead we plan to optimize our queries and improve our latencies. We also plan to invest in our infrastructure to save cloud cost further.

Acknowledgements:

To begin with, if you have been reading this blog till now then we would like to acknowledge you, Thanks for giving us your valuable time and reading this blog till now. I hope it was helpful for you.

Next, We would like to acknowledge the people behind this:

DevOps and SRE : Nirmaljeet Singh
Data: Vinod Adwani
Engineering: Sunil Chaurasia, Harish Srinivas, Ravi Singh, Vaishakh N R, Yatin Gupta
Leaders: Suresh Khemka, Ronak Shah, Puneet Kala, Shantanu Preetam

Let me know if you have any questions or suggestions,

Until Next time,
Nirmaljeet Singh
Email: nirmaljeet277@gmail.com

--

--

Nirmaljeet Singh
apna-technology-blog

Building and scaling Apna for millions of users worldwide