Working with a Physical VoIP Infrastructure as a Software Engineer

Edwin Tunggawan
Cermati Group Tech Blog
10 min readNov 12, 2020

In this article, I’d like to share some things I’ve learned from working on Cermati’s call center system, from researching and choosing the system components to hacking a few tools, to help bring the call center capability to what we have now.

Background

Cermati uses two different IT infrastructure setup to support our company operations: the software infrastructure and the VoIP infrastructure.

An illustration of a data center, definitely not the one we’re managing.

Software Infrastructure

Our software infrastructure is cloud-based. It’s running on top of cloud-based VM engines and SaaS provided by various cloud platform providers. At the time this article’s written, the software infrastructure is mostly based on Alibaba Cloud due to Alibaba being the first major cloud provider to open their data center in Indonesia — and as a financial technology company, we’re obliged to keep the customer financial data within Indonesia’s country borders.

The software infrastructure is managed by a team of engineers from our infrastructure engineering team, who oversee the resource usage, cost optimization, system security, while also developing tools to streamline our software development process by leveraging the capabilities provided by our cloud infrastructure.

VoIP Infrastructure

Another infrastructure we’re maintaining is our VoIP infrastructure, which is using bare metal servers kept in the server room at our office sites. The servers provide VoIP services to the employees located at the same office as the servers. The VoIP infrastructure was conceived separately from the software infrastructure system, as at the beginning our software engineers were focusing on improving the software products.

The VoIP infrastructure was initially set up by a hired VoIP network consultant and then left to our office IT support team to keep the VoIP services running on their own. But as our call center grows in size and the need to improve our call center operations and performance arise, we’re expecting more regarding the stability of our call center infrastructure and the availability of the data from our VoIP servers.

Getting Introduced to VoIP System

By a twist of fate, even though my background has nothing to do with VoIP now I’m helping with both the software and VoIP infrastructure. There’s a bit of a story to that.

How exactly I got the job can be traced back when I was interviewed for a software engineer position at Cermati back in April 2017 by our CTO, Oby Sumampouw.

I think the conversation went somewhat like this.

CTO: So what are you expecting for your next job?
Me: I kinda want to try working on a lower-level layer regarding IT infrastructure since I don’t feel like I know much about it. I’m currently bored with working on web-based product development.
CTO: Oh, actually we have a call center VoIP system here that’s set up on physical infrastructure. We need to integrate it into our back-office system but we don’t have anybody working on that. What do you think?
Me: That’s something I know nothing about. Sounds interesting, what needs to be done?

Being the YOLO man that I was, I simply accepted the project due to the thrill of getting into something I don’t know much about.

So I got the job and the task of integrating the VoIP system to the back-office. The part I initially owned from the VoIP system was only the code and configuration for WebRTC, but as the company grows I need to help more on the system and work more closely with the call center IT ops.

Eventually I got promoted to the lead and then to the manager position of the infrastructure platform team in our software engineering department, but I think I’m the only one who has the interest to work on it from our team so it’s mostly stuff that I did by myself in collaboration with the call center IT ops team in the last few years — also, we have a dedicated IT manager and a VoIP specialist to hold the fort on the call center IT ops now so I don’t work on it as often nowadays.

In this article, I’d like to share some things I’ve learned from working on Cermati’s call center system.

PBX Machine Set-Up

We’re running Elastix PBX suite on our call center PBX machines, with Asterisk as the VoIP server powering it. We chose Elastix because it was free and the solution it provides good enough solution to our needs. We’ve actually compared different solutions from Genesys, Avaya, and other custom built software based on Asterix. But Elastix seemed to be the most commonly used and good enough for us, so we decided to go with Elastix. Nowadays we’re trying out Issabel.

A screenshot of an older version of Elastix PBX’s admin dashboard (image from DistroWatch).

Before I joined, I was given access to online learning materials regarding the Asterisk VoIP server. The learning materials were video-based, and I was more of a book person so I got myself a copy of the book Asterisk: The Definitive Guide.

I expected to configure the Asterisk dial plan directly but turns out Elastix already does that. We only need to configure Elastix from the web UI provided for the PBX machine configuration and administration, then Elastix will compile the configurations we set on the web UI into an Asterisk dial plan.

Sometimes the compilation result doesn’t match what we expect. In that case, we might need to modify Elastix a bit. But I prefer not to customize Elastix’s code since it adds some complexity for the call center IT ops to set up a new machine since there are extra modifications that need to be applied when the machine is first set up — otherwise it wouldn’t work as we expect.

In general, our PBX machine life cycle can be summed up into these four points:

  • The call center IT ops team sets up Elastix PBX on a machine.
  • The Elastix PBX machine is configured with the SIP trunks for incoming and outgoing calls.
  • The PBX machine is configured with the extensions of call center staff who’re required to perform their calls from the said server.
  • If the server fails, the call center IT ops team will try to troubleshoot the issue. If it’s not fixed, they’re going to reset the machine by setting up Elastix PBX on it again.

The SIP trunk and extension configurations are backed up regularly, so if a machine fails the call center IT ops team can simply wipe it clean and restore the backed-up configurations to the machine to have it running again.

Having a backup machine is recommended since hardware failure on one physical server might halt a good portion of our call center operation which translates to potential loss from halted telesales and KYC — know your customer — verification activities.

Having an extra machine that’s ready to be used in case of failure should help mitigate that, as it allows for faster recovery in the case where the primary machine takes some time to be fixed. Also, our PBX machines are nothing too fancy so we can spare some budget to keep an extra machine around.

SIP Trunk

A SIP trunk is an interface between a TCP/IP-based VoIP system and the PSTN — conventional phone network. When I first joined, we have these OpenVox boxes around in our server room.

An OpenVox device (image from IP Phone Warehouse).

The OpenVox boxes are assigned IP addresses in our system and are registered to the Elastix PBX machines as SIP trunks. These boxes host several GSM SIM cards, these SIM cards are used to perform outbound calls from our Elastix PBX machines.

We can set up SIM cards from various GSM providers in the OpenVox machines and configure the Asterisk dial plan to use certain SIM cards to perform the call based on the destination number’s phone service provider. With this, we can optimize the cost of the phone calls by creating a dial plan rule to ensure each call destination will be routed via the SIM card with the cheapest rate for the respective destination number.

Managing these devices on our own was a bit cumbersome since we were using prepaid GSM plan — which is the common subscription plan for GSM in Indonesia — and we needed to keep track of how much remaining credit each card has. Our IT support department created a bash script that periodically connects to each OpenVox box using SSH and runs a set of commands to check the remaining credit available for each SIM card. If a SIM card runs out of credit, the call performed through the SIM card will fail.

Another issue was that we needed to consider our geographic location when choosing the GSM operator for the SIM cards we use. We should choose an operator with a very good reception around the call center area, and we have to consider physical barriers that might block the GSM radio signals. Also, we need to consider the capacity of the cell towers around the office location because having too many devices connecting to the same tower and performing calls concurrently might degrade the quality of not only our calls but also the people around the call center who’re also connecting to the same cell tower.

A cell tower up close (image from GSMtowers).

Nowadays, we’re using third-party SIP trunk providers so our call center IT ops can work on something else instead of maintaining the SIM cards’ credit balances. We also don’t need to worry about the technical details regarding the GSM signals and cell tower capacity anymore.

Power Supply

Power transmission line (image from Omicron).

At Cermati, our software services that are supposed to be running 24/7 are all deployed on the cloud. So we don’t need to worry about supplying power to the machines running them. But it’s something that we need to address on our VoIP infrastructure.

Even though the VoIP services aren’t required to be up 24/7, it’s expected to be up and reliable during the call center’s operational hours along with the office network infrastructure. We need to provide UPS — Uninterruptible Power Supply — for all of our network devices to ensure the operations can go uninterrupted whenever there’s a power outage and the building’s power line is switched to a secondary power line.

We put small-capacity UPS boxes on our network switches since the network switches have relatively small energy consumption needs. The server room needs something bigger since the machines consume more power. Our call center IT ops team picked APC Smart-UPS for supplying power to the PBX machines, but the job wasn’t finished there.

The data contained in the PBX machines are synchronized to our data analytics pipeline for various business analytics purposes. While the UPS set up in the server room can provide enough power for the machine to stay up for ten to twenty minutes until the backup power line is connected, there are times where electrical power is unavailable in the building and the machines need to be shut down gracefully. If not, it might corrupt the data contained in the PBX machines and mess up with our data processing pipeline — and fixing that issue is pretty inconvenient especially at the times when the power lines are unstable due to circumstances since the data corruption can happen pretty often.

APC provides software called PowerChute that can be configured to automatically shut down machines with the PowerChute agent installed after a configured period of time had passed after the UPS stopped receiving power from the building. The UPS will periodically send UDP packets to the agents containing the information regarding the UPS’ state.

We initially tried to explore how PowerChute works and how to configure it, but we ended up hitting a dead end when we noticed that the type of APC UPS boxes we have need an extra hardware module called the APC Network Management Card (NMC) in order to work with PowerChute. We checked and found that the NMC module is quite expensive and we might need that for several UPS machines.

Since we had a Raspberry Pi lying around unused from an old software engineering intern project that has been deprecated, we decided to set up the Raspberry Pi as a beacon instead. We connected the Raspberry Pi to the server room’s network, but we connected it to the building power line directly instead of to one of the UPS boxes. This way, the Raspberry Pi will be down if the building power is down and come back up when the power is up again.

A Raspberry Pi device (image from Amazon).

The PBX machines are set up with a script to periodically check whether the Raspberry Pi beacon is up, and if it’s down for more than several consecutive minutes the PBX machines will shut down gracefully — as long as the UPS still have enough power for the PBX machines until they finished shutting down.

The issue with this set up is that the Raspberry Pi beacon won’t have graceful shutdowns, and from what we read after checking resources on Raspberry Pi usage it seems that Raspberry Pi devices also don’t handle power cut-off well and may be unable to boot due to data corruption. We handled this problem by purchasing more Raspberry Pi devices as backups for the beacon, so we’re prepared if at some point the beacon is bricked. A Raspberry Pi costs much cheaper than an APC NMC anyway, and we only need to set up one Raspberry Pi beacon at a time.

Conclusion

Working with the call center infrastructure is honestly a unique experience for me. I got the opportunity to learn more about how traditional telephony systems work, as some of the concepts of the analog telephony system are adopted into the VoIP system and the VoIP infrastructure is also interacting with the analog telephony infrastructure via the SIP trunk.

I also learned some things about physical infrastructure management and getting to experiment with actual physical hardware setups, which I never really got my hands into before. I still don’t feel like I have that much knowledge with hardware though, but I got a better sense about it after revisiting some basic physics materials in my free time since 2017 — around the time just before I joined Cermati.

--

--

Edwin Tunggawan
Cermati Group Tech Blog

If I’m not writing code, I might be reading some random stuff.