Designing a VoiP application on GKE

Mahmoud Sharif
6 min readOct 13, 2018

--

A Call Center Application hosted on the Google Cloud Platform (GCP), running on the Google Kubernetes Engine (GKE)

One of the largest manufacturer of VoIP devices is hosting their solution on the GCP. They had contacted us for help on designing a newer version of their application to be hosted on the Cloud.

Their application is hosted on GCP. It is a unified communications and contact center application, customer and team engagement solution, mainly designed for call centers.

It offers:

  • A Self service IVR
  • Secure IP Telephony with VoIP
  • Web conferencing
  • It integrates voice, email and chat

The solution is sold by distributors, partners, and resellers to customers.The application is developed and built by the manufacturer but hosted on the GCP. The customers are call center organizations, the users are call center agents.

The QA tests, development, devops process, and VoIP integration, is run by the customer in premise.

Background

Version 1 of their application was already hosted on the GCP. The challenge was that Version 1 had performance limitations, issues such as slow response time, and memory leaks. At the time, during the architectural design of version 1, the scope was not to reduce downtime or outages. The scope was to save on cost, a fast release, a stable Qos, and a highly efficient UI/UX experience.

Version 1 was built on a 3 tier architectural pattern grouping the functions into common areas of interest. The solution was divided into a Presentation Layer, an Application Layer, and a Data Layer.

Another architectural mistake inVersion 1 was that the design was too deep. It was a blueprint. The architect had designed deep, low-level function calls, not taking into consideration the input of other stakeholders such as the Lead developer. The previous architect acted as a programmer, detailing all the complexity, forgetting the complex part of the design, and interactions between components, layer, and system.

Version 1: 3-Tier Architecture pattern

It was difficult to troubleshoot. Every object, functions and interfaces were coupled together.

  • Application Layer: JavaScript, HTML, and other front-end scripts were coupled together, mixed with embedded function calls inside the client, using HTTP.
  • Business Layer: Business calls with Python, Web frameworks, and Workflow systems.
  • Data Layer: SQL queries, function calls, REST API calls were mixed together. They were not separated.

Version 1 did not make good use of the benefits of a 3 tier architecture (which are speed of development, scalability, performance, and availability).

Version 1 was hosted on the GCP, with Virtual Machine (VM) instances managed in an instance group. This posed a challenge because If you have a limited amount of hardware resources, VMs may only be an option when you have a small number of processes that you want to isolate. Because of the overhead of VMs, you often end up grouping multiple applications
into each VM because you don’t have enough resources to dedicate a whole VM to each app.

The performance, scalability, and reliability was overlooked. They also did not use a good, effective monitoring system.

Action

We participated in weekly meetings with the product team. We took down the scope of the new requirements. The newer requirements were based on reliability, security, scalability, adaptability, performance, an efficient and customizable logging system, a rigid SLA, and the introduction of a Monitoring system.

The customer reported challenges and any other issues that they faced during development. The timeline of the development of the project was set to 8 months. The deadline was tight.

We worked with the customer’s engineering team, worked with Functional Analysts, product specialist, VoIP telephony specialists and Google developer engineers.

In this regard, we interacted with the developer engineers, hand in hand with deployment specialists to release code into productions, SREs to resolve bugs, and product developers to write new code.

We decided that the best solution would be to revamp the backend only. We preserved the presentation layer as the requirement was to deploy the new version seamlessly. Thus, users would not need to be trained, there was no need to have a learning curve. Furthermore, in version 1, the requirement at the time was to have a stable, effective, and performing UI/UX experience. The frontend was well tested and performing well.

If you have a great number of isolated processes on the same machine, one should use containers, as they are a better choice because of their low overhead. Each VM runs its own set of system services, while containers don’t, because they all run in the same OS. Another advantage of containers is that nothing needs to be booted up, unlike VMs. A process run in a container starts up immediately.

In the new architecture, we designed a new business layer and data layer, grouped together, the new architecture is based on a microservice design implementation. The backend was hosted in the GKE infrastructure. The microservice design is based on containers. Each container is assigned only 1 service. The microservices are segregated by namespaces and cgroups which share resources.

We decided to use Kubernetes as a deployment automation system to manage containers in a distributed system. It was chosen because it simplifies common tasks such as auto-scaling, auto-healing, upgrading, management and monitoring, deployment, and Logging.

With Kubernetes, one may simply orchestrate a docker image of the application, by running a Docker file. Compared to VMs, containers are much more lightweight, which allows the solution to run a higher number of software components on the same hardware, unlike a VM which needs to run its own set of system processes, which requires additional compute resources in addition to those consumed by the component’s own process.

We designed the logging and monitoring architecture to keep track of performance based on custom metrics such as CPU, memory, HTTP response latency. This is a best practice for troubleshooting.

We also setup a health check monitoring system to probe the health of the Load balancers. We setup HTTP ingress rules (L7), and configured a TCP/UDP Proxy Load balancer(L4). HTTP ingress rules is the default Kubernetes Load Balancer, it handles HTTP traffic on port 80 and port 443. This is to expose the pods serving the HTTP requests to the Internet.

The TCP/UDP Proxy Load balancer exposes the traffic on the ports not handled by HTTP. This Load balancer exposes the pods serving TCP and UDP traffic (for Voip calls), email, chat.

The GKE infrastructure hosted the backend to the solution. We migrated the backend from the GCE infrastructure to the GKE infrastructure.

The customer managed the entire devops lifecycle process in their premise.

Version 2: New backend architecture based on a microservice design architecture

Result

In version 1, the response time was a complex issue, each request had a high response time which was impacting the overall experience of the solution. In version 2, the delay in response time was reduced by 90% because we implemented a multi-regional GKE cluster.

We also implemented an TCP/UDP Load Balancer which distributes the traffic within the proximity of the user. The load is balanced and the traffic is spread with the Load balancers.

We implemented health checks, incident alerts, and monitoring systems. Incidents are now resolved within a quicker SLA. After each incident, the resolution time has improved by 60%. There are now alerts raised if the CPU reaches a threshold of 60% or if the traffic is not able to reach the backend, or if the response time reaches 7 seconds.

Now the system is scalable. We introduced scalability based on real-time metrics, such as CPU load, memory consumption, and queries per second. We introduced redundancy so that if a node fails it is automatically recreated, or if a pod fails, it is automatically duplicated. If it fails there will be an existing backup to restore all the data. We introduced a failover system, and we reduced the switchover by 90%.

Version 1 was hosted in the GCE (Google Compute Engine) infrastructure, Version 2 had the backend migrated to GKE, which turned out more suitable to handle VoIP traffic.

--

--