CARE: A platform for upskilling workers, at scale.

Part 1 — User Research Case Study on Upskilling for Baristas

Published in

Labs Notebook

11 min readJan 4, 2023

Imagine a world where instead of clicking on a video link to learn how to change your car’s oil, you put on a pair of AR glasses and a smart coach guides you in real time through every step of that task at your own pace. It analyzes how close you are to the car, the tools you’re holding in your hands through motion-tracking, cameras, and non-invasive sensors in your garage. It tells you that you’re holding the correct filter wrench for the model car you’re working on — and that you’re holding it correctly. It highlights the car jack, prompts you when you’ve lifted the car high enough, and shows you exactly where the drain plug is. Here at Digital Experiences in Accenture Labs, we’re working to make that vision a reality, and to develop the technologies and experiences that make learning new skills accessible for everyone.

We’re not just thinking about maintenance skills you can learn for tasks at home, but centering our strategy around targeted upskilling and reskilling across entire organizations. We’re at a pivotal moment for talent retention, recruitment, and upskilling, and talent is a major priority for organizations who need to innovate quickly. The Great Resignation has impacted the talent pool of almost every major industry — being able to find and train new talent to account for mass shifts in employment is incredibly expensive and time consuming. According to the 2021 Training Report, companies are spending on average approximately $1071 per person for training, with government and military institutions investing the most per person in training ($1483 per person), followed closely by manufacturing and distribution industries ($1373 per person). These costs add up for companies when factoring in turnover rates — Americans are quitting their jobs in record numbers with no signs of slowing down — an estimated 4.16 million Americans quit in August 2022.

Our solution? We’re proposing investment in technologies that can reduce training costs for companies and have better long-term outcomes for learners — specifically smart coaches enabled by metaverse technologies, like AR and VR. These technologies can be used to simulate a wide variety of training scenarios, including social, technical, and physical skills, and provide a safe and waste-free environment to practice in. Talespin is one company that offers metaverse training solutions with their easy no-code content creation suite and real-time data analytics of learners who use their Skills Insights Platform.

Male worker wearing VR headset and orange safety vest holding hands out while interacting with robot operating in back of warehouse. — Metaverse training is starting to become a reality. We envision this will be a transformational technology in enabling people to learn skills more efficiently from anywhere in the world.

In immersive metaverse experiences, we can analyze a wide array of information to understand how a person is behaving during training when they use their hand controllers, how they move their head, and observe their eye movements — but not every task is suitable for solely metaverse training. In the same 2021 Training Report, only 10% of learning solutions and technology investments were made in augmented and virtual reality across all surveyed companies. In DE, we’re investigating technologies that can capture a user’s actions in real-time through non-invasive sensors, coupled with AI systems, such as:

Body position and motion tracking using cameras throughout the person’s work or learning environment,
Microphones so we can hear how the person is working — and answer their questions,
IoT-augmented tools and environments.

We’re experimenting with different industry use cases and instructional delivery techniques to understand how to train people effectively on new physical tasks through a digital smart coach that integrates this information and understands where you’re at in the learning process.

The problem? Digital smart coaches aren’t a thing yet…

There are a few challenges in fully realizing our vision:

1) Detecting the right objects needed during a training session,

2) Detecting the trainee and how they interact with those objects,

3) Determining the trainee’s intent during interactions,

4) Supplying appropriate feedback at the right time to guide a trainee through the training process.

Previously our team focused on the technology aspect of developing an intelligent training coach. They used openly available computer vision programs like MediaPipe, Detectron2, and Grounded Situation Recognition to create what we call a contextual activity recognition engine (CARE). By combining the capabilities of these open-source platforms, we can assess the first two challenges easily — the third would need to be assessed through other means, like neural networks that can interpret human behaviors and actions (look here to see how Zhang and colleagues at the University of Cinncinnati are approaching this problem). This blog won’t go into the details of that pipeline.

This blog focuses on our team’s work to address the fourth challenge, where we conducted user research to develop the foundational principles of creating our own smart coach. Let’s dig into the details…

Study Goals

We recruited 8 people with no barista experience to learn physical tasks of medium complexity. The tasks needed to be complex enough to have a series of steps, and a few points of no return. If part of the task was incomplete or not done correctly, the user would need to begin again. Our first use case began with training baristas to make a new drink.

Young female barista holding tablet while on the phone in a coffeeshop setting. — Baristas often have to juggle multiple tasks at the same time. Remembering complicated drink orders while completing other tasks is a challenge that we hope to solve through augmented smart coaches for context-aware adaptive training.

Why baristas?

This is a relatively generalizable task. A barista’s experience can generalize to many different workplace situations — a variety of customer service tasks require multi-step interactions with different tools to accomplish a goal. For example, for housekeeping services in a hotel, there are multiple stages of cleaning and organizing to prepare a room for a guest. In manufacturing, there are assembly lines that require a specific set of steps for a machine part to be correctly composed.

Baristas are also relatable — their workplaces, and many of their activities, are familiar to many people, making it easy to map onto other industry use cases.

Hypothesis

Our primary research hypothesis: situation-aware adaptive training is better than video training. We predicted that trainees would be more effective and have a better experience when learning a task using situationally-aware adaptive training through a digital smart coach compared to traditional video training.

The Experiment

We wanted to collect ecologically valid data to understand the lived experience of AI-assisted training. Hence, we conducted our study with novices who had no experience using espresso machines, and who would truly be learning new skills. Without having representative end users work with our prototypes up front, we may be pushing a technology that is not useful or useable by our intended audience.

We simulated AI-assisted training using a user-interface (UI) mockup and a human-in-the-loop to approximate the automated system we are working toward. The remote study was conducted via Zoom and situated in participant’s kitchens. We compared two conditions:

In one, the trainees watched video recipes for coffee drinks made by a famous cafe chain (our control condition)
In the other, they were trained by our simulated AI coaching assistant.

We used a within-subjects approach, meaning that each participant completed both kinds of training (simulated coach vs. video training) for two kinds of coffee drinks (caramel macchiato and flat white), and the order and condition across participants were controlled through pseudo-randomization.

The simulated AI coach was introduced as a prototype device called “Smart Barista”, joining the Zoom call. Smart Barista provided three types of visual and auditory cues after each step and written suggestions when the recipe was not followed (see image below). The transitions of the UI mockup were controlled by a researcher observing the activity and following a script in a classic “Wizard of Oz” user research paradigm. Eight participants (4M, 4F, ages 19–50) were debriefed after making each drink and were asked to compare the two types of training at the end.

Three icons with a speaker icon underneath each. The left-most icon is a green check mark, the middle icon is a yellow circle containing a lowercase “I”, and the right-most icon is a red “X” symbol. — These are examples of the cues given to participants during our user study. The green checkmark was given when a participant completed a step correctly, the “i” cue was provided when participants needed some extra support and provided additional details in text format to help, and the “X” symbol was provided when participants completed a task incorrectly and needed to start that portion over. Each of these symbols had an associated audio cue.

Importantly, we situated the Smart Barista as a prototype of a smart coach. We explained at the beginning of the study that this device uses computer vision techniques and voice analysis to try to understand a user’s intent, and that it could make mistakes. We discussed that this training session would utilize the webcam and microphone data to train itself and improve its feedback process. All the while, we had a researcher behind the scenes controlling how this prototype interacted with the user. If the user felt that the feedback provided by Smart Barista was inaccurate, we prompted them to describe in detail what they were doing so that behind the scenes, we could provide the right guidance.

On the left, a white box with black text contains the yellow icon with a lowercase “i” in it, indicating that the participant needs support. The text reads “Hold on! You didn’t pour in the right amount of milk. TIP: 8oz is usually appropriate for one cup of coffee. Add more or pour some out as needed.” An image of a young woman in a white shirt next to an espresso machine and its tools is on the right side. — An example of supportive feedback during the training process. We asked participants to position their webcams so that we could monitor their activities and had Smart Barista provide feedback at the right times to make sure they completed their task successfully.

Our Findings

The participants in our study were asked to report on their confidence and self-perceived effectiveness in replicating two coffee drinks. We recorded these measures before, during, and after interacting with both training programs to understand a user’s learning process over time. They reflected on their experience with the Smart Barista, providing implications for future design choices for such a system. In particular, they discussed the topics related to learning about its capabilities, developing trust in its purpose and, in general, building a relationship with this type of coaching system.

Smart Barista increases confidence and (self-assessed) effectiveness

Two graphs side by side, with gray and purple lines on each graph. Title of the image is “Ratings of Confidence and Effectiveness by Training Type”. Confidence graph is on the left, measured across time, with the purple bar representing the AI condition indicating better performance across time compared to gray. Effectiveness graph is on the right, with the purple bar representing the AI condition indicating better performance across time compared to the gray bar, representing video condition. — Ratings of confidence and effectiveness by training type across all participants. Each measure was aggregated across three different timepoints: before starting the training, during the training, and after the training exercises. Confidence was defined as how confident participants felt in making their drink, while effectiveness was defined as a user’s perception of their actual performance.

Barista Coach was preferred over training video by 6 out of 8 participants even with our limited functionality and early-stage UI.

“It took the guesswork about remembering which step was next. . .you get affirmative confirmation, so you don’t have to wait until the end.” [P1]

Interestingly, many participants had mentioned that when first interacting with Smart Barista, they weren’t sure how the interaction was going to go with the smart coach. Because they didn’t have experience with similar technologies before, they had trouble setting expectations when interacting with Smart Barista.

“Right away I was panicking, rather than paying attention.” [P2]

Overall, participants did prefer smart assistance during the training experience and after taking time to reflect. Once they did understand Smart Barista’s capabilities, they felt much more confident and effective in their task.

“I felt like she was the saving grace. She was like ‘don’t worry, it’s not that bad’, and I was like ‘oh, okay!’” [P2]

2. Timing of feedback matters

Continuous feedback increases confidence with unfamiliar tasks. The majority of our participants preferred feedback during the training exercise as opposed to all the information delivered up front, as is done in the video. They also expressed a desire for potential mistakes to be called out right before or after making them:

“I also liked the little warning sound… I didn’t mind that it waited until I did it wrong.” [P3]
“If the sound played as soon as I did it right, I would find that kind of annoying.. . But I like that there was a bit of a pause” [P1]
“I think it’s useful to have confirmation whenever you’re done with a step… It’s useful to have affirmation.” [P5]

3. Delivering instruction without distracting from execution is a key design challenge

In discussing the training experience, the participants stated preference for cues that don’t break task flow. The auditory cues were perceived as most helpful interactions because they were peripheral to the task. The exception would be the situations where richness of visualization is really needed for clarity:

“I think the tones … were really important because when you’re focusing on not burning yourself, or watching what happens so something doesn’t overflow, if you only had a visual, you wouldn’t know that you made a mistake. . .I liked that it waited for me, that it was actually watching me.” [P3]

4. Intelligent coaching systems need to build trust about both their capabilities and their purpose

We observed and our participants reported their initial hesitancy surrounding interactions with the Smart Barista:

“ … as I started to make the coffee, I lost my confidence, like “Do I know what to do first?” [P3]
“How bad do I have to screw up before it [intervenes]? It felt kind of like taking a test. It’s like “make the milk”, no instructions, just figure it out. It would have been better if it was like “Make the milk” and had some steps on there, or handy hints or tips.” [P1]

The important design implication for training systems is to address the learners’ need to be reassured that the system is there to support them, instead of assessing them. We believe that mixed initiative interactions may help to calibrate the amount and type of coaching to deliver the most effective training experience.

5. Learners vary regarding the preferred mode and amount of coaching

We asked our participants about their preferred modality and environments for delivering training. Participants suggested a wide range of learning environments and modes for Barista Coach:

“Augmented reality. With a heads-up display… as you’re pouring have a little fill meter so that you know when to stop pouring” [P1]
“I think it would help if it talked to me, but I can see how that would be distracting if it was like “no, don’t pour the milk like that,” like a nagging mother or spouse or something.” [P4]

The participants also expressed desire for the coach to adjust to both implicit and explicit cues from them:

“If [Smart Barista] could assess or detect some uncertainty or distress within me it could either reassure or caution …” [P5]

6. Most subjects are already comfortable with conversational assistants on consumer devices

Despite it being presented as an autonomous system, some participants in this study quickly developed a relationship with the Smart Barista. Participants assigned anthropomorphic qualities to the imaginary device and created mental models about its behavior:

“I feel like, at the beginning especially it seemed like I was saying something complicated and she — I’m assuming she’s a girl, I guess — she’s like ‘yeah, you’re done, let’s go, you’ve got it.’” [P2]
“The most disorienting experience for me was me asking a question that was outside of the step and Smart Barista just not interacting. . .and to be honest, Smart Barista’s lack of response was enough feedback to tell me I need to do this first.” [P5]
“I’m honestly really surprised how intuitive it was, how it knew what I was doing.“ [P3]

Overall, we believe that participants’ prior experiences with consumer digital coaches has bearing on design of training systems and may make the time right for this kind of coaching in the workplace.

Where do we go from here?

From our initial user study, we learned about people’s expectations surrounding an intelligent coaching system. Compared to traditional video-trainings, which are often used for teaching skills at scale across industries, our findings demonstrate that users would prefer to use a smart coach when learning new physical tasks. But they also indicate some interesting problems that we need to tackle: how to customize training delivery, how to provide the right kind of feedback at the right time, and how a smart coach can develop a meaningful and trusting relationship with trainees. These insights are supporting our next phases of research, where we’re working on building more robust versions of a coaching assistant, and are testing new use cases in different domains.

If you’d like to learn more about the ongoing work at Labs, check out our Accenture Labs Notebook Medium publication, where you can learn more about up-and-coming research on different technology applications. In the meantime, stay tuned! In our next blog, we’ll discuss a use case surrounding home healthcare, as we expand our user research into new industries to tackle unforseen challenges in training workforces at scale.