Explaining AI when there is no time for explanations
Corti has developed an artificial intelligence (AI) that detects Out-of-Hospital Cardiac Arrest (OHCA) for emergency calls. The technical feasibility and performance of the AI was validated via several clinical studies and trials, leading us to the question of:
How do you present and explain a complex AI to end-users who are operating under extreme time pressure in a high-intensity work environment?
This was one of my main interests when taking on the task of redesigning an AI-powered alert that supports dispatchers in recognising OHCA during emergency calls. In the article, I summarise learnings related to this research question, including reflections on the design process and a proposal for six overall guiding principles.
The six design principles
01 Leave the decision to the expert
02 Design follows model performance
03 Balance catching attention and disturbing
04 Make every second count
05 Enable instant decoding
06 Transparency to “backstage”
So, what’s the problem?
OHCA is a time-critical condition with survival chances decreasing by 10% every minute of delay from collapse to defibrillation. Looking at e.g. the United States and Europe combined, more than 600,000 people sustain an OHCA every year, and their overall survival rate is ~ 8%.
Performing cardiopulmonary resuscitation (CPR) or using an automated external defibrillator (AED) before arrival of the ambulance is therefore critical for survival. In fact, dispatch assisted CPR can increase the 30-day survival rates by 50%, making fast recognition of the condition during the emergency call pivotal.
However, recognition of OHCA is a challenging task, as only ~ 1% of all emergency calls involve OHCA. Approximately 25% of all OHCA are overlooked by the dispatchers, meaning that life-saving instructions aren’t provided to the caller.
The OHCA detection framework
To address this problem, Corti created a ML framework, containing two models; (1) an automatic speech recognition (ASR) model transcribes speech to text and (2) an OHCA detection model that predicts OHCA events from the transcribed speech in real-time. It is possible to configure the framework with different accuracy settings, meaning the ML can be more or less inclined to suspect OHCA.
The framework and the accuracy was tested and validated via several clinical studies and trials. The results clearly proved that an OHCA detection model with an optimal configured accuracy recognised a higher degree of OHCA within the first minute compared to the dispatchers. This led us to the next challenge of delivering the detections in a way so that dispatchers felt supported in a desired manner.
6 Guiding Design Principles
In total, we conducted more than 100 hours of exploratory research listening to 9–1–1 calls, observing workflows and talking to dispatchers. This established an in-depth understanding of the dispatchers, their needs and the job-to-be-done, which enabled the team to start redesigning the OHCA alert. The creation of the UI design was an iterative process that included more than 50 different designs and 21 rounds of testing with end-users.
Besides delivering a new and improved OHCA alert to our customers, the work also resulted in 6 guiding principles for AI-powered products to end-users working under time-pressure.
1. Leave the decision to the expert
One of the first things we realised when talking to dispatchers was that they considered Corti’s OHCA AI a competitor. Dispatchers thought of it as man vs machine, a race about being the first to recognise the OHCA. This instinctively led dispatchers to hold a negative attitude towards the AI, which resulted in distrust and adoption resistance. To overcome this, we needed to identify key-drivers for establishing the AI as a team-player.
A trigger of resistance was when given concrete ”orders” of actions. We did several tests displaying clear instructions such as “Start CPR”, and they all had a negative effect on the dispatchers’ willingness to collaborate. By providing too specific instructions, we took away agency from the dispatchers and affected their professional pride. In addition, we also increased the risk of being perceived as incorrect or irrelevant. Even when an OHCA detection was on point, the instructions could be off for the given emergency situation. We therefore constrained ourselves from telling dispatchers how to assist callers but instead provide a set of useful tools for them to navigate.
In relation to this, we also learned the importance of emphasizing that the AI produces a condition suggestion, rather than a 100% certain result from a laboratory test. Adding the simple label, “Corti’s AI suggestion”, above the condition title proved useful in terms of establishing the alert as a proposal based on a mathematical probability.
2. Design follows model performance
Testing a lot of different user interfaces made us aware how closely the design is linked to the architecture and performance of the machine learning model. In one of the tests, we tried out some more flashy designs displaying bigger and more colourful alerts to catch the dispatcher’s attention. While this proved useful for confirmed OHCA, the dispatchers found them disturbing and even provoking when getting an alert for other call types. This forced us to rethink the design and e.g. make the alerts smaller and less noisy.
As mentioned earlier in the article, the OHCA detection framework was constructed so that it could be configured with different accuracy settings. This proved valuable as it provided us the opportunity to display different types of alerts and hereby not only affect the selection of ambulance response but also the conversation and dispatchers behaviour. For an accuracy of 2.5 %, we chose to display the OHCA detection as a form of pre–alert or warning to reflect the higher degree of uncertainty. This was done by colouring the alert orange and adding the label; “Moderate degree of certainty”. The accuracy of 1.5% was kept as a high-alert state indicated with a red color and the label; “High degree of certainty”. By doing this, we could also increase the overall performance of the ML model, while avoid being perceived as wrong.
3. Balance catching attention and disturbing
Handling 9–1–1 calls requires more than medical expertise. Dispatchers also have to master an extreme level of multitasking when dealing with emergency situations. Besides recognising the condition, the dispatchers are required to select incident code, write documentation, confirm and echo information and dispatch an ambulance, while at the same time providing instructions to the caller, and keeping everyone calm at the scene.
It was important for us to design an OHCA alert that was noticeable but that didn’t disturb the dispatcher’s primary focus and workflow. We considered it unethical to negatively affect the conversation between the dispatcher and the caller, as emergency calls concern urgent medical assistance with acute, life threatening illness or with injury.
After testing a lot of different sizes, formats and behaviours, we concluded that the alert had to be placed on the primary screen to be within the dispatcher’s main area of focus. Due to an already busy screen setup, this required the alert to be rather small. We also kept motion design and animations to a bare minimum as e.g. flashes to highlight changes in the alert state or fast movement of the AI disturbed the concentration of the dispatcher.
4. Make every second count
Besides system stability, building for emergencies requires an extraordinary degree of interface reliability. The dispatchers are under an extreme pressure, which makes it crucial for them to (1) know exactly where things are, (2) how to decode them and (3) what happens when interacting.
This made us design the alert so that it always overlaid other open windows, and it reopened in the same position, which makes it visible at all times and fast to locate. During the tests, we tried out windows that moved or expanded when a detection was triggered. However, we quickly skipped this design, as we risked covering or blocking critical functions and information. Furthermore, we enabled the dispatcher to click the alert and access the instructions independently from the detections to make sure not to block any workflow. In relation to this, it was important for us not to delaying the interaction from recognising OHCA to instructions, which led us to ditch all ideas about cool animations for e.g. the open the instruction window.
5. Enable instant decoding
Question-driven explanations is a well-known technique for defining explainability in terms of the questions a user might ask about the AI. When using this approach, the main question from dispatchers concerned words or combination of words that triggered the detections. Though the results from simple prototype tests were positive, studies from real-time emergency calls showed that dispatchers did not remember anything else than the color and title of the alert. This pushed us to test a bunch of different colours to identify the best indicator for the two levels of accuracy.
To further accommodate for a context that didn’t leave room for long interpretation times or failures, we tested everything from microcopy to call-to-actions and visuals. The goal was to enable the dispatcher to instantly decode and understand everything without hesitation.
6. Transparency to “backstage”
Transparency proved crucial in order for dispatchers to understand and trust the OHCA alert. While testing the alert, we learned that differentiating between different states and actions such as awaiting call, analysing call, disabled and system error is important for the dispatcher to understand what is going on and act accordingly. Changing the alert text from “Awaiting call” to “Analysing call” e.g. served as proof that the software was running. When dealing with emergency calls, the dispatchers have no time to figure out what is going on or troubleshoot errors.
Lastly, we wanted to make sure the dispatchers understood that the alert was based on output from an AI. We tested several different visuals to identify the best way to reflect AI-powered audio analytics. Like many others, we concluded that an orb was suited for this purpose. In addition, we added a subtle movement to the orb during emergency calls to indicate that the AI was actively listening in on the conversation.
Corti is a HealthTech company specialised in voice-based AI for medical consultations to help healthcare professionals make faster and better decisions.