Is Amazon Connect Audio Stream Suitable For Real-Time Application?

Measuring end-to-end delay in Amazon Connect

Amazon Connect is a self-service, cloud-based contact center service that makes it easy for any business to deliver good customer service at lower costs. There are no up-front payments or long-term commitments and no infrastructure to manage; you can scale your Amazon Connect contact center up or down seamlessly. Amazon Connect also enables you to leverage the power of the whole AWS ecosystem, which offers a broad set of cloud-based products.

One of the key metrics of a satisfying call experience is the end-to-end one-way delay (sometimes referred to as “audio latency”), which is the time elapsed between the customer saying something and the agent hearing it.

Not being able to interact smoothly with an agent, an interactive voice response (IVR) or an intelligent virtual agent (IVA), are poor customer experiences, and we strive to minimize them.

According to the G.114 ITU recommendation, the user perception of the call quality deteriorates as the one-way delay exceeds 200 milliseconds. If it exceeds 350 milliseconds, holding a conversation is difficult and the delay becomes very annoying. Thus the target figure for most of IP telephony systems is to keep the one-way delay under 200 milliseconds.

In December 2018, Amazon introduced a new feature in Amazon Connect which allows consuming the audio stream of the caller with an external application via AWS Kinesis Video Stream. This feature opens up new opportunities such as running analytics on what the caller is saying during the call. However, we wanted to know if this feature is exploitable for real-time processing such as using an alternative speech recognition engine or an external IVR. That’s why we designed this little experiment.

Set Up

Since Amazon Connect is a telephony platform, measuring anything on the audio stream requires making a phone call. Agents are forced to use the CCP (Contact Control Panel) which is a web-based softphone to answer a call. Those limitations make the process of measuring the delay slightly more challenging.

Amazon Connect uses Contact Flows to define each step of the experience customers have when they interact with your contact center. For our test we created a simple Contact Flow, that starts the audio stream, starts the call recording and directly transfers the caller to an agent. The associated call recording behavior was set to “Record Agent and Customer”.

Contact Flow used for our tests

We acquired a new phone number and associated it with our Contact Flow.

To proceed to our first test, we needed a host to run the CCP. We set up a new Linux EC2 instance in the same region as our Amazon Connect instance to reduce the impact of the network latency, and we were good to go.

Measuring The End-to-End One Way Delay

The first experiment we did was to measure the one-way delay, or at least a close approximation. Here is how the experiment took place. Each step of the experiment is numbered on the diagram and described below.

First experiment: end-to-end one way delay
  1. Start the Amazon Connect Contact Control Panel (CCP) on the EC2 instance, log in as an agent and configure the audio interface on the EC2 instance to redirect the output device to the input device (i.e. we essentially echo the speakers in the microphone).
  2. Start a call.
  3. Answer the call in the CCP on the EC2 instance.
  4. Play an audio file on the phone. Hang up.
  5. Download the call recording from S3.

We get one stereo audio file, where the left channel (L) is what Connect received from the caller (the softphone), and the right channel (R), what Connect received from the agent (via the CCP).

Expected waveform of the recording
Actual waveform of the recording (because of echo cancellation)

As you can imagine, Amazon Connect does some echo cancellation (fortunately for us) that’s why it’s hard to measure anything by looking at the waveform. But, if we look at the spectrogram instead, which shows the frequencies variations through time, we are able to recognize the audio pattern and to measure the delay.

Spectrogram of the recording

After a few test executions, we observed an average delay of 170 ms.

As you may have observed, we didn’t exactly measure the one-way delay in this experiment. We measured the round-trip between Amazon Connect and the agent (CCP). However, we can argue that the delay between Amazon Connect and the agent should be roughly equivalent to the delay between the agent and Amazon Connect. Furthermore, we can say that if the customer is near the Amazon Connect instance, let’s say in the same region, the time between the customer and Amazon Connect should also be roughly equivalent. This means that the delay we measured is indeed equivalent to the one-way delay in the ideal case where both the agent and the customer are in the same region as the Amazon Connect instance.

On the diagram you can see on the left the delay we effectively measured during the experiment and on the right is what it approximates to

Bear in mind that those results may vary when the caller and/or the agent are in a different location or even throughout the day.

Measuring End-to-end Delay of the Kinesis Video Stream

The second experiment was to measure the delay introduced by Kinesis Video Stream.

To be able to measure this delay we made a small Java program, (inspired by this one). The program plays out loud the audio chunks from the Kinesis Video Stream.

Second experiment: end-to-end delay of the Kinesis Video Stream

Here is how this experiment took place.

  1. Start the CCP on the EC2 instance, log in as an agent and configure the audio interface on the EC2 instance to redirect the output device to the input device (i.e. echo the speakers in the microphone). Install java and our program that will consume the stream on the instance.
  2. Start a call.
  3. Answer the call in CCP on the EC2 instance.
  4. Retrieve the Kinesis Video Stream ID of the call.
  5. Start the program to play the audio stream.
  6. Play an audio file on the phone. Hang up.
  7. Download the call recording from S3.

Just like in the previous experiment, we didn’t effectively measure the delay between the caller and the Kinesis Stream consumption. We measured the roundtrip between Amazon Connect and the Kinesis Stream consumer (i.e. our Java program). As we established before, in cases where the customer is near the location of our Amazon Connect instance, the delay between the agent and Amazon Connect should be similar as the one between the caller and Amazon Connect. Hence our measure is roughly equivalent to the delay between the caller and the consumer of the stream.

Actual measurement and how it’s equivalent to one way delay

Running this experiment several times gave us an average delay of 760 ms.


With a one-way end-to-end delay averaging out at 170 ms, Amazon Connect should ensure a high-quality experience for your customers.
However, as of now, according to our experiment results, the audio stream delay (in Kinesis Video Stream) measured at 760 ms is far from the 200 ms target, which makes it currently unsuitable for real-time application. Kinesis Video Stream with Amazon Connect is however a very good solution for applications with a lower delay sensitivity such as post-processing applications, analytics, transcription, sentiment analysis, etc.

Amazon Connect and Kinesis Video Stream are both relatively new products and even if we don’t know their roadmap, we can only expect that Amazon will keep improving them. Amazon Connect shows us what the future of the contact center could look like and having such powerful tools at our disposal opens up a ton of new use cases. We’ll see in the near future how Amazon will be able to improve this delay or if they offer another way to consume the audio stream.

Nu Echo now offers Amazon Connect customization services to enterprise clients across North America. As a conversational innovation solution provider, we ensure that when a customer calls your company, they are presented with a high-quality experience every step of the way.