The elevator pitch for Rilla goes something like this: “Rilla builds speech analytics for conversations between store associates and shoppers in physical stores. Associates clip on a mic, connected to their phone and they talk to customers like normal. Our AI listens, and captures analytics from their conversations like who customer is, what they care about, and how was their experience. It’s like Google Analytics for offline commerce.”
Simple. But that’s on purpose.
It’s actually a bit complicated. And not because of what you think actually.
The part where we take all these conversations between associates and shoppers and use AI to turn them into numbers? That’s actually the easy part.
The hard part is actually the one where we have to capture the conversations in the first place.
I know. You’re probably like “huh? What do you mean? Isn’t that just pressing record on the phone and call it a day?” Well, yes, but not really. The reason? Gossip.
You see, when you have people working in a store for hours and hours on end, you’re bound to get them talking with each other about their private lives. What we refer to in the common vernacular as gossip.
And that’s a good thing. It’s a human thing. It’s only natural when you have people spending so much time with each other. But this, this is the kind of thing that makes our engineers scratch their heads.
Because you see, we don’t want to process these conversations. These are private conversations between employees that are not relevant at all for improving the business. The point of our technology is to understand how to give your shoppers a better experience, not to spy on the private conversations between store associates.
So it poses a very peculiar technical challenge: how do we capture the relevant conversations between shoppers and store associates without processing the private conversations store associates have with each other?
Naturally, our first idea was just to tell the store associates to start and stop the recording any time they talked with a customer. But associates are people, and people forget, they don’t think about it all the time. Especially when you’re busy stocking shelves, dealing with complaints, and trying to help customers. Last thing you want on your mind is to remember you have to pull your phone out any time someone talks to you.
And what if a shopper just asks a quick question about where a particular product is? Is the store associate supposed to say “hold on, let me turn this thing on… can you repeat that again please?”
No, that’s annoying. That’s making the customer experience worse not better. So we thought of another thing.
“What if we just process all the audio, transcribe it, and then use natural language processing to find where the conversations with shoppers are, and then we throw out the rest?”
That’s more reasonable, but still not good enough.
Sometimes the NLP will be wrong, and some of the store associate conversations will still remain in the system. That’s no bueno.
What about using wake words like Alexa? Instead of listening for “Hey Alexa”, the app is only activated when the store associate says “Hey welcome” or “Do you need any help?”
That’s much better, but still not good enough. Conversations with shoppers don’t always start the same, so you’re bound to lose many important conversations. Then there’s also the risk that the store associate mentions a wake word when they’re talking to their colleagues and the app starts recording by accident. No bueno.
That’s when we hit it with the right method.
The Anti-Gossip Algorithm.
One of the things we’ve always been really good at is at identifying individual voices of people. Like we’re really good at that. Better than Google and Amazon good.
So we thought: “Why don’t we just use our fantastic voice ID model for this?” And that’s what we did.
Any time a store associate signs on to the app for the first time they have to speak for like a minute. That gives us their voice print.
At the same time, when the store associates talk with shoppers, we are able to identify whenever there is a unique shopper voice print. So we have identifiable voice prints (store associates) and unidentifiable voice prints (shoppers) whose identity we don’t know.
Once we’ve signed up all the store staff into the app, it becomes a very simple equation.
“If unidentifiable voice print not in conversation, throw out conversation.”
Any time store associates are speaking to each other without a shopper in the conversation, it will be thrown out.
Using this as the foundation, and then using Natural Language Processing to make sure the conversations are relevant for customer experience, makes this a sure fire way to prevent private conversations store associates have with each other from ever being processed, stored or analyzed by our system.
In practical terms, this means store associates can start their shift, they can turn on the Rillavoice mobile app, and then they can go about their day without having to worry about turning off the app any time. Their private conversations with each other will never be captured, processed or stored.