How Hudl Delivers Game-Changing Support Experiences with Amazon Connect and the AWS Ecosystem

Elliott Bulling
In The Hudl
Published in
8 min readSep 7, 2018

In August 2017 our support department answered 61% of calls in under three minutes and 78% of calls in under five minutes. In August 2018, we answered 84% of calls in under 3 minutes, and answered 98% of all calls in less than five minutes. Even more incredibly, we achieved this with higher volume and the same number of support reps. This success is due in part to our implementation of customizable tools built utilizing the Amazon Web Services ecosystem.

At Hudl, we pride ourselves on putting customers first. Our support department’s mission statement says it all — we deliver game-changing support experiences for teams, athletes and fans. This takes many forms, like staying late to help a coach upload video, answering emails on your day off (not because you have to but because you want to), and never keeping coaches waiting on hold for long.

Many call centers use phone software with most of this reporting built in, but it’s not customizable and it lacks in a lot of other areas. In my four years at Hudl, we’ve used several different systems, but in July 2017 we switched to Amazon Connect.

We switched because of their ever-expanding product offering, customizable features and the cost savings it would provide us. Connect has options we’ve never had before that almost instantly improved our workflows and efficiency. Being able to embed our contact control panel (CCP) aka the phone the agents sees and uses, editing call flows from the web and creating more customizable call flows/agent states has been huge.

On the other hand, there were some features from our old systems that our leads and agents enjoyed, but weren’t available in Connect. The main one was real-time reporting that helped us deliver on our mission. Game-changing experiences can’t happen if the team doesn’t know what’s happening in our call queue.

Real-time reporting is crucial to keeping the wheels turning. Without it, you can see incoming call awareness decrease, which leads to longer wait times, missed calls and a worse customer experience (measured by our surveys). For instance, we previously had a live list of who was up next to receive a call. This allowed agents to plan downtime or take quick breaks effectively.

Another missing feature was leads having a quick check on who is/isn’t available, why and for how long. This knowledge gives them a chance to help anyone tied up with after-call tasks, which allows us to help the next customer in line even faster. We also wanted an alert to let escalation tiers know when wait times reach a certain point — our goal is to answer every call in eight minutes or less. A notification would allow us to get “all hands on deck” to clear out the queue.

With Connect, we knew we could leverage the entire AWS ecosystem, so having these features immediately wasn’t a requirement. Once we had the bones of the system put together, we took the time to dive in and get creative. During this discovery we found ways to build solutions to these problems. We used Connect, Kinesis Streams, Lambda, Cloudwatch, DynamoDB, API Gateways and Slack to create simple solutions for our team to get them the info they need.

Collecting Agent Status Data

After some initial research, we found an online game that used DynamoDB to report on how many people were active in each game level and display that data in the game map. By thinking about phone statuses as game levels, we could apply similar technology to our phone system.

To get near real-time data, you have to move out of the Connect console and its built-in reporting. It’s at best 15 seconds behind, and isn’t organized in an easily digestible format or location. Our first step in getting the data out was to set up an agent event stream using a Kinesis Stream. This can be configured from the Connect console data streaming settings.

We then took that stream of data and pulled out the important parts (i.e., agent name, current status, status start timestamp, current timestamp and routing profile) using JavaScript running in Lambda. We checked each event in the stream looking for the type of log heartbeat or status change. Because each of these logs has its own format, we had to account for that with a simple if statement looking at log type. Once we know the log type, we parsed out that info from the JSON log using basic JSON parsing, and saved each value as a variable. (GitHub Link to Kinesis Stream — Dynamo)

After we had the info pulled out, we saved it to a newly created DynamoDB table by connecting up with it in the Lambda function. This function runs asynchronously on each log that comes in from our agent event stream.

The Dynamo table we created has five columns (agent name, agent status, duration, routing profile, and ttl). The name, status, and routing profile is taken right from the event in our stream. The duration is measured using the event start time, which is in date-time format, then converted to epoch time. Epoch is necessary for when we make calculations later on. Our last column, TTL, takes advantage of a nifty feature in Dynamo called Time to Live or TTL. The TTL value for us is an epoch time of when that row is no longer needed and can be removed. We remove rows if they don’t get updated in 12 hours, which ensures new data each day.

Every time we get an event from the stream, we look for a row in the table with the same agent name, which is set up as our primary partition key in the table. If a row exists with that name, the row is updated. If not, it’s added to the table.

We now had a updating table of agents, their status and the time they started that status. The next step was to make that data easily accessible for our team.

Pushing the Data to Where Our Team Lives

We wanted to have the data where our agents and leads live. Our team, like a lot of tech companies around the world, lives in Slack. To get information to Slack, we created a slash command to hit a custom API Gateway we set up in AWS. This gateway triggers a Lambda function that takes input from the slash command and returns the desired information. It does the math to figure the durations of statuses by taking the epoch time the function was called and comparing it to the epoch time we have listed in the table for status start time. Subtracting epoch times is the easiest way to do date math, plus a true date-time format isn’t an option in Dynamo (which is why we stored that first value in epoch). We can do many different things, such as look for the oldest “available” status time to return who’s up next in the queue, or take a count of the current statuses of our reps.

We can also show the next five people up for calls and the name or number of people in any individual status. We even bucketed some routing profiles together in our escalation tiers so we can see those status counts as well. (GitHub Link for Agent Status Board)

Once it was all put together, you can see just how many AWS services were used.

Since we already had this table set up, we also wanted to have proactive alerts in place for when certain thresholds are reached. One option we wanted was to see when support reps are in “follow-up work” for a long time. With this alert, our leads can reach out and see what they can do to help to get the rep back on the phones. We set up a Lambda function that’s triggered every 15 minutes to check our Dynamo table for reps who have had this status for longer than 10 minutes.

This threshold is a simple variable at the beginning of the function that can be changed depending on the season. This pings a dedicated leads channel in Slack with the agent’s name and the current time in “follow-up work”. The nature of our company means our after call work sometimes requires extra time to complete. So we view this notification as a way for our leads to know what’s happening, but also a push to step in and help our reps with tricky situations.

The last tool we have built (so far) with AWS is the notification to different tiers of support when wait times creep up. Our support reps are always logged into our phone lines, while people in other roles (e.g., technical leads, quality leads, management and engineers) are often busy with other tasks. But when waits times creep up, these roles jump on the phones as well. This is set at four minutes currently. (GitHub Link for Follow-Up Bot)

To keep our team in the know, we have an automated alarm set up in AWS Cloudwatch. Within Cloudwatch, we’ve set up monitoring and alarms for the queue wait time. Once we hit the alarm level, an email is sent to trigger the Slack bot. This is one reason our team has been able to take more calls year-over-year with a lower wait time, but the same number of reps.

Summary

Our implementation of these measures has allowed our team to get calls answered faster. In August 2017, our team answered 61% of calls in under three minutes, 78% in under five minutes and 91% in under eight minutes. In August 2018, after these features were put in place, we answered 84% in under three minutes, 98% in under five minutes and 100% of calls in under eight minutes. To make this even more impressive, our total number of calls answered in August went up year-over-year, from 20,907 to 21,374.

Real-time reporting is what makes call centers function more efficiently. Our switch to Amazon Connect has allowed us to create custom functionality we’ve never had before, which in turn helps us get information to our reps in the systems they’re already using. Our team is more informed and better prepared to deliver on our mission. Game-changing support experiences are possible when you have a world-class team paired with technology designed to give them the information they need in real time. Our support engineering team knows we’ve only scratched the surface of what’s possible, and we’re excited to continue pushing the boundaries.

Here is the full repository of all these tools: https://github.com/ebull53/AWS_Connect_Integrations

--

--

Elliott Bulling
In The Hudl

I am an Engineer @Hudl, a HS Football Coach, and a econ nerd. I am always trying to tie all these things together which can lead to surprising outcomes.