Fano Labs
Published in

Fano Labs

How Call Analytics Pipeline works, Part 1: Kafka

Callinter — a call analytics system

Nowadays, AI technology has become ubiquitous in our daily life and made a significant contribution to multifarious industries. It has also stepped into the realm of call centres. Callinter is built for call centres. It is a comprehensive call analytics system targeted for quality assurance, compliance, and incident detection.

To handle thousands of hours of call each day, the performance and stability of the analytics pipeline play a crucial part in the success of the system. So, here we would start by introducing Kafka as one of the key technologies that we utilize and contribute to the stability of analytics pipeline performance.


Before we talk about Kafka, let’s have a brief overview of the Analytics Pipeline.

Analytics Pipeline is a series of processes consisting of some microservices such as the speech recognition, intent extraction, keyword detection, sentiment detection, business compliance check and scoring… etc. A master service will be in charge of controlling the flow of the pipeline. The pipeline serves as a foundation in our product, therefore making it efficient and stable is critical.

Here are some statistics about our analytics pipeline and the resources usage:

  • Handles up to 10,000 calls per day (Alternatively, 7 calls need to be processed per minute)
  • The average duration of each call is 5–10 minutes
  • The ratio of the call duration and the time required for speech recognition is about 1:2, meaning 1 min of call required 2 min of processing time (per CPU core)


At the early stages of development, the initial Analytics Pipeline’s design is as follows.

Figure 1: initial analytics pipeline

Each step of the process is synchronous. However, as the number of analytic requests increases, a critical problem arises. The issue is that memory consumption is proportional to the number of tasks running in each step. This is not a problem in general but in our case, the processing time takes long (minutes in contrast to milliseconds) in the speech recognition (SR) step that the SR workers’ memory usage could go up quickly. This is even more troublesome as the analytic workload varies throughout the day (e.g. usually peak at midnight), causing the SR worker to be overloaded.

One way of solving this problem is to scale the number of workers in the system at peak time. However, this is not feasible in a resource-constrained environment. Scaling the workers infinitely is simply not a choice in our case. Under the constraints, we opt to use a message broker to queue up the tasks. Therefore, the workers are allowed to pull the task from the message broker base on its workload, in another word, flattening out the load curve of the system (as shown in Figure 2).

Figure 2: the relationship between time and workload

What is Message Broker?

Message broker acts as a middleman, exchanging data between the services, where data could be transmitted through a standardized messaging protocol. It allows messages to be routed and delivered to the appropriate destination. To guarantee the data delivery, a message queue is often used to store and order the message until the target service could consume them. The message sender is usually referred to as a “Producer” where the message receiver is referred to as a “Consumer”.

In a system with a amessage broker, there is two way of distributing messages:

  1. Point to Point: Services utilize the broker to send a message in a one-to-one relationship. The message is sent by one producer and consumed only once.
  2. Publish / Subscribe: In this method of the message distribution, aka “Pub/Sub”, producer’s message gets published to a “topic” and all consumers subscribed to the topic will receive the message. (Figure 3) And this is the method we mainly used in our system.
Figure 3: message broker

Let’s have a look of the improved flow of the analytics pipeline.

Figure 4: improved analytics pipeline
  1. Publish a request for analytics to Queue 1.
  2. Pull messages from Queue 1 and verify whether the request of analytics is valid. Before doing speech recognition, we will check whether speech recognition is needed. If we did it before we could skip that step and use the previous result saved in the database directly in the next steps.
  3. If speech recognition is needed, the service will publish a request for speech recognition to Queue 2.
  4. Pull messages from Queue 2. When a message is received, the service will publish an acknowledgement for speech recognition to Queue 3.
  5. Pull messages from Queue 3 and update the status of analytics.
  6. When finishing speech recognition, the service will publish the result to Queue 4.
  7. Pull messages from Queue 4. When a message is received, the service is going to enrol voiceprint and do natural language processing, including intent, entity and sentiment detection.
  8. After finished NLP, the service will handle the business logic, such as word count, checking compliance and scoring, and summarise the result.
  9. Save the result in the database.

Next, we mainly introduce our message broker — Kafka

What is Kafka?

Kafka is a distributed message broker with high throughput, availability and scalability. There are 5 important concepts that you will need to learn about Kafka: 1) brokers, 2) topics, 3) partition, 4) producers, 5) consumers, 6) Zookeeper

Figure 5: Kafka structure
  • broker: the Kafka server node, which is responsible for creating topics and storing the message.
  • topic: the category of the message, which could be regarded as a queue logically. The same topic can be replicated in one or more brokers to achieve high availability
  • partition: in order to increase the throughput of Kafka linearly, a topic will be divided into one or more partitions (each topic has at least one partition.). Each partition physically corresponds to a folder under which all messages and index files of the partition are stored. The more partitions, the greater the throughput, but the more resources are needed.
  • offset: Each record in the partition will be assigned an ID number in a sequence, which is called offset. Offset is used to uniquely identify each record in the partition.
  • producer: responsible for publishing the message to a topic.
  • consumer: responsible for proactively pulling messages from one or more topics and consuming them. Normally, the messages are not deleted after consumption.
  • zookeeper: maintaining the state of the entire Kafka cluster and storing the information of each Kafka node.
Figure 6: Anatomy of a topic

Why chooses Kafka?

Nowadays, there are all kinds of messaging systems in the market with their own advantages. One of the popular message systems is RabbitMQ. Although the throughput of RabbitMQ is not as high as Kafka, it is still reliable and has good support for load balancing or data persistence. Moreover, it provides priority queuing functionality, which Kafka is lack of.

We have chosen Kafka over RabbitMQ for the following reasons. The advantages of using Kafka eventually outweighed that of the RabbitMQ:

  1. Excellent performance with very high throughput without dedicating many resources. This is important since we would like to allocate more resources to the analytics pipeline. (Benchmarking Apache Kafka: 2 Million Writes Per Second (On Three Cheap Machines))
  2. High availability, as Kafka is a distributed message system. Multiple copies of a message could be distributed across the Kafka cluster. As such, a partial server failure will not lead to data loss and server unavailability.
  3. Idempotent producer. Same messages will not be consumed more than once by the same consumer.
  4. Support of streaming like KSQL, which is important to implementing our future online real-time analytics
  5. Support of events replay.
  6. A strong ecosystem that development can take the advantage of numerous libraries without the need for reinventing the wheel.
  7. Good third-party Kafka web management interface.

However, Kafka itself also does have some pain points.

  1. Steep learning curve since there are many concepts and components needed to be understood.
  2. Externally stored metadata in Zookeeper.
  3. Rebalancing of the consumer will block message consumption in other consumers.

Note that it is expected that item 2 and 3 will be greatly improved in the future Kafka release.

How Kafka integrated into the analytics pipeline

We have a Kafka cluster with 4 topics configured.

Figure 7: the workflow of Kafka in our pipeline
  1. Callinter Service produces a message to request an analytics.
  2. Callinter Service consumes the message for analytics request. If the request is valid, the service will produce a message to request for speech recognition.
  3. Speech Service consumes the request for speech recognition and produces a message about acknowledgement for speech recognition.
  4. Callinter Service consumes the acknowledgement for speech recognition, which will update the status of analytics.
  5. When Speech Service finishes speech recognition, its producer produces a message containing the result of speech recognition.
  6. Once Callinter Service consumes the result of speech recognition. It will proceed to perform natural language processing, handle business logic, and finally save the result in the database.

That’s how we utilize Kafka working in Callinter’s analytics pipeline. We will continue to improve it according to the business requirement changes and will further update it with technology evolvement. We will definitely share our research with you in time by then. Please stay tuned and keep an eye on us.

About Fano Labs

Fano Labs is an AI company headquartered in Hong Kong specialised in Speech Recognition and Natural Language Processing technologies. Focusing in a variety of languages, dialects and mixed languages, specially Cantonese and languages in Southeast Asia, our solutions help enterprises from various sectors with customer service, compliance and other lines of business.

If you are looking for business or job opportunities, please visit:




A deep-tech start-up focusing on AI for non-mainstream and mixed languages.

Recommended from Medium

5 Steps to Simplify the Cost Incurred By Your Enterprise with AWS Cost Savings

SOLID Principle in Object-Oriented Design

Basics about JaCoCo Maven Plugin

What is a Binary Search Tree?

How to Installing Composer on Mac?

SOLID Networking: The Single Responsibility Principle

Some fictional moments

How to implement github login in laravel * DevRohit Think simplified

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Nicholas YI

Nicholas YI

Hello World

More from Medium

ML Platform User Experience: Research to Production

CMU Heinz Capstone Project — Building Sustainable Mobility Analytics Tool


Data Enrichment for ML Model Deployments