The Single Feature Killing AI Conversations
I’ve long asked, "Where’s my Jarvis?" I mean, I am seriously envious of the AI that Tony Stark has in the Marvel movies. Beyond that, the AI can do tons of things, have a normal conversation, and actually be helpful!
Today, it still feels like we’re a tad off that picture. From my perspective, it might mainly be related to a design flaw by humans, not AI.
One feature in AI speech models like ChatGPT might be holding back the adoption of talking to AI: it doesn’t know how to wait during a conversation, which means cognitive overload and stress.
It’s subtle and straightforward but may have a significant effect on human adoption.
If you’ve ever tried speaking with ChatGPT or any AI speech-to-speech model, you’ll know what I’m talking about. These models rely on pauses to understand when to respond. Seems simple, right? But that’s where things get messy.
Natural conversations are full of pauses — those breaks to think, breathe, and reflect. The problem is that AI doesn’t get that. It listens for silence, and the moment it senses you’ve stopped talking, bam, it jumps in. There’s no time to finish your thoughts, gather what you want to say next or pause to consider.
The result? You’re rushing your sentences, finishing thoughts quickly, and actively trying to avoid pauses. Otherwise, the AI will cut in before you’re ready.
Stress You Might Not Notice — But It’s Real
This creates subtle but real pressure. Even if you don’t consciously feel stressed, your brain is working overtime to keep up. This leads to cognitive overload, where you always try to stay ahead of the AI, maintaining control of the conversation.
And sure, ChatGPT has a feature that lets you interrupt it if you need to. But let’s be honest — that’s not a solution. The pressure shifts to you to stop the AI mid-sentence, which still doesn’t make for a smooth or natural exchange.
It’s like talking to that friend who’s just waiting for you to take a breath so they can jump in. You find yourself speaking faster, thinking on your feet, and scrambling to fill any silence.
Feels More Like a Race Than a Conversation
Good conversation has a rhythm — a back-and-forth, with room to breathe. But when AI is too quick on the draw, it feels like a race. And that’s not the race people want to run, no matter how fancy or advanced the tech is.
From a psychological and neurophysiological perspective, this doesn’t just feel “off” — it’s disruptive to our normal flow of conversation. And when a conversation doesn’t feel good, we naturally avoid it.
The Challenge of Real-Time Understanding
Perhaps this is about the limits of AIs — or the resources we can put on them during conversations. Conversations in humans aren’t just about talking — they’re about predicting and understanding each other’s intent and meaning. This makes our exchanges feel natural.
However, real-time language processing is a monumental challenge for AI models, requiring them to understand context and intent at lightning speed.
To avoid overwhelming computational resources, AI models often use hardcoded timing rules to decide when to speak. This mechanical delay is what makes conversations with AI feel off.
So basically, it’s a human design choice—a trade-off that sacrifices natural conversational flow for practical limitations. And IMHO, with dire consequences if left unchecked.
What My Research Showed: Uncanny Valley and Cognitive Load
Let’s get into some of the science. In a DARPA-funded study on human-robot interactions that I coauthored, we examined how timing in conversations affects our experience with AI and robots. One of the key takeaways? Robots — and by extension, AI speech models — need to listen to be effective conversationalists.
We looked at neurophysiological data, like EEG and ECG measures, to see how people responded to different types of interactions. In human-to-robot exchanges, there was heightened frontal theta activity — a sign that your brain is working harder to bridge the gap in understanding. And unlike in human-to-human conversations, there was less mu suppression, indicating reduced empathy and social connection. Essentially, your brain has to work harder to make sense of AI’s awkward timing and responses.
And here’s where it gets interesting: these findings parallel what we know about the uncanny valley effect — where humans feel discomfort or eeriness toward robots that seem almost, but not quite, human.
When robots don’t align with our expectations (whether visually or conversationally), our cognitive load increases and our stress responses are triggered. Disliking and avoidance behaviors are then just around the corner!
In fact, our research won the 2017 HCI International Best Poster Extended Abstract Award, showing that timing, context, and understanding in conversations aren’t just “nice-to-haves” — they’re essential.
My claim here is not that people will necessarily develop an “uncanny” response to the voice function with chatGPT. But I still believe people find the conversation strained, challenging, stressed, and quite “off.”
Why It Matters for AI Conversations
Here’s the deal. If AI can’t get the timing right, it’s a problem. Just like a stand-up comedian needs perfect timing to land a punchline, human-AI interactions require that delicate balance between speaking and listening. When the balance is off, we instinctively feel awkwardness and even a bit of unease.
The takeaway is simple: if AI can’t communicate well, it won’t be adopted well. Stress and cognitive overload lead to avoidance, and if AI can’t make our lives easier or conversations more enjoyable, we’ll hesitate to use it.