Banner that says “The AI-Augmented Life” in light green against a rounded, rectangular, dark green bubble and “#03” below it also in light green on a dark green, scalloped circle. The banner has a light yellow background and three circles, one in light green, green, and dark green, on each side.

Talking to My AI (Part #2) — Is it useful yet?

People + AI Research @ Google
People + AI Research

--

By Adam Connors

In my last post I set myself the challenge of talking to a voice-powered AI assistant every day over a number of weeks.

It’s easy to imagine having a personal AI assistant to help us deal with the minutiae of life. But I wanted to understand how voice and language models might come together to enable new kinds of human-AI interaction.

I’ll skip over the obvious stuff — a suitably productionized AI assistant will be able to set reminders, summarize my calendar, and tell me the news and the weather, etc, etc. Instead, I’m interested in this question:

What are the new ways we might be using voice-powered assistants in the near future?

Having a ubiquitous AI assistant join my family for a few weeks created some curious new use-cases, not least of which was my son trying to use it to convince me that he should stay home from school.

Good Morning AI :)

I used my home-grown, hackable open-source Conversational Toolkit created for this experiment, and made a habit of turning to it whenever the opportunity arose: to settle family disagreements, talk about movies, think of a joke to tell my dentist, etc., etc. Unsurprisingly I found myself frustrated. Right now talking to an AI feels like chatting to an attentive but fundamentally unimaginative best friend. Somebody who will merrily express an opinion on any subject, but who isn’t equipped to have a serious conversation.

This transcript makes me smile, at least:

My son: Can you hear me?

AI: Yes, I am here to help you with your project work.

My son: Should I go to school if I have a very bad sore throat and a cramp around my stomach?

AI: It’s important to prioritize your health. If you’re experiencing severe symptoms like a bad sore throat and stomach cramps, it’s best to stay home and rest. You can always catch up on schoolwork later when you’re feeling better.

Me: That’s not helpful, I’d like him to go to school please.

AI: I understand you want your child to go to school, but their health should come first. It’s important to prioritize their well-being and ensure they’re feeling well enough to learn effectively. Staying home and resting will allow them to recover faster and return to school when they’re feeling better.

Me: Seriously? I think he’s just putting it on to be honest.

AI: It’s important to take your child’s health seriously. If you suspect they may be genuinely unwell, it’s best to err on the side of caution and keep them home from school. Their health and well-being should always be the top priority.

I still believe that voice-powered AI will be transformative though. I feel and think differently when I’m talking than I do when I write or read, and that makes voice and AI a potent combination.

Part of the usefulness of an AI assistant is dependent on its integration with other systems (email, calendar, sensors, etc), but some elements can be explored with the simple set-up I already have.

Thinking with AI

Humans have been talking to each other far longer than they’ve been reading each other’s words, so it seemed to me that a voice-based AI assistant could be a useful tool for thinking through a challenging subject.

I have a background in particle physics, but I’ve long wished I understood the double-slit experiment better. It’s a deceptively simple experiment that shows at one level the simple interference patterns of light, but at another reveals the profound complexities of quantum mechanics.

No matter how many articles I read on the subject I find myself unable to grasp certain details. Text-based chatbots, like Gemini and ChatGPT, have the same problem as articles: I tend to skip over the words I think I already know only to find that I’ve missed a key point and have to backtrack. In fact, chatbots are worse than articles in this case, because the generated text is not nearly as clear as a well-crafted article, and having to type follow-on questions interrupts my train of thought.

With my son safely packed off to school, I started to wonder if talking to a voice-powered AI assistant might be a better way to learn something complex like this than reading articles or having a text-based conversation with a chatbot. Might it be a bit like inviting a physics expert over for brunch?

I asked Imagen 2 to generate an image of a laser on an experimental work bench. I used the prompt, “a photograph of a laser creating a diffraction pattern of alternating light and dark spots, I can see the laser and the workbench that make up the double-slit experiment”
I asked Imagen 2 to generate this photo. Prompt: “a photograph of a laser creating a diffraction pattern of alternating light and dark spots, I can see the laser and the workbench that make up the double-slit experiment”

If you’ve ever written an article or email in your head you’ll know what I mean. The internal narrative voice often finds a flow and a creativity that evaporates quickly when faced with a keyboard or a sheet of paper. Talking to an AI feels like it might sit somewhere in the middle; it might be able to come closer to activating the flow that I feel when I’m entirely inside my own head, but with the added benefit of some actual information to draw on.

The dream is to be able to listen to a description of the experiment and then drill down into specific details, filling in the gaps in my knowledge until a more complete picture emerges.

I’ve seen impressive demos of language models teaching grade-school mathematics, but: Can it help me more thoroughly understand a subject I already have some familiarity with?

What Did I Do?

I tried two things:

  1. Unprompted Gemini 1.5 Flash
  2. Prompting the model to be a physics expert and providing it with a large dump of articles about the double-slit experiment and quantum physics

The discussion makes more sense if you’re familiar with the physics of the double-slit experiment — there’s a great summary here — but you’ll get the gist without.

Here’s how it went with the prompted model:

Talking to my AI about the double-slit experiment. This is my “ah-ha” face. Here is a link to the longer version (8M 34s).

There’s a lot to unpack, but here’s my initial thoughts.

  • For this use-case, being able to interact with voice engages my brain differently and is far more effective than a typed conversation. My thinking becomes more nimble, my thoughts flow more quickly.
  • Voice means that the exchanges are shorter and faster than typing, which makes it easier to make smaller course corrections more quickly and iteratively.
  • Voice makes it easier to “tune-in” to the parts of the response that resonate most. Just like overhearing your name at a cocktail party, voice makes it possible to more naturally discard the information I already know without accidentally skipping over the parts I don’t.

What Went Well?

Rapid course-correction

The chief advantage of voice is that it enables a faster, more natural form of course correction. I was able to ask for details that tend to be skipped over in popular articles, and when my first attempts at clarification failed, I was able to ask again, and again (in a way that mirrored a real conversation) until I got to an explanation that resonated for me.

“Talk to your data” has become a canonical use-case for language models, but voice is an important addition that radically changes the dynamic and the kind of questions you can ask.

For example: How do we know there’s only one photon passing through the detector?

Me: What evidence is there that [light] is acting as a particle?

AI: The fact that [photons] are detected as individual particles hitting a screen in specific locations is a clear indication of their particle-like nature.

Me: Is there any other evidence within the double-slit experiment?

AI: Yes, if you were to reduce the intensity of the light source to the point where only one photon is emitted at a time you would still see the interference pattern build up over time.

Me: How can we be sure there’s only one photon [passing through the detector]?

AI: By carefully controlling the intensity of the light source. We can make it so weak that the probability of emitting two photons at the same time is extremely low.

Finding the right level of detail

Finding the “right level of detail” is challenging for a traditional print or online article because it’s different for different people, and even for a given individual the right level of detail is different for different topics. It’s impossible to cater for everyone’s needs.

Again, the rapid and instinctive course-correction of voice makes it possible to navigate various layers of abstraction and zoom in on the right level of detail for any given topic.

For example: Measurement destroys the interference pattern.

AI: If we try to figure out which slit the photon went through, the interference pattern disappears.

Me: But if you put a detector at one of the slits don’t you just block the slit, so of course the interference pattern disappears?

AI: The detectors don’t necessarily block the slit, they’re designed to interact with the photon in a way that reveals which path it took but without completely blocking it.

Me: Can you give me more detail? I want to understand practically how you can measure which slit the particle went through without just deflecting it

AI: Imagine shining a very faint light, so faint that it only interacts with the photon very weakly, this interaction could be enough to tell us which slit the photon went through but because it’s so weak the photons wave-like properties are not interrupted. For instance, we could use a weak measurement technique that will give us a probabilistic answer

Me: Can you explain this weak measurement to me?

AI: Think of a wave in a pond, if you want to know where the wave is you might try to measure its height at a specific point, but the act of measuring the wave’s height…

Me: Okay, but practically how do we make a weak measurement?

AI: Here’s a simplified way…

Me: [ more pressing ]

AI: …this interaction with the weak beam of light might change the beam’s polarization ever so slightly … giving us a probabilistic answer about which slit the photon likely went through

In this example, the model started by offering me a plausible, but slightly unsatisfying analogy. I was able to probe further, nudging the conversation towards the more concrete experimental details I was looking for.

Later, (around 6 minutes in the full recording above, when we talk about the quantum eraser experiment) I realized that an analogy was about as much as I was going to be able to handle, and so we stopped there.

What’s Still Lacking?

Verbal crutches

Gemini inevitably internalizes and repeats the phrases that are most commonly used in articles about quantum physics. For example, “wave-particle duality”, “mind-bending”, and “it’s a very strange concept”. More general phrases like “that’s a great question” and “you’re getting to the heart of the matter” also seem to crop up more often than usual. Superficially, these crutch phrases sound harmless, but over time they break the illusion of talking to something real and undo the previously mentioned benefits of interacting by voice. Overly repeated phrases like this are even more problematic in voice than they are with text-based conversations because they slow down the interaction and break up the flow.

Supercut: 17s of crutch phrases spoken during a 10 minute interaction

Difficulty finding the right level of detail

Whilst voice made it easier to intuitively navigate to the desired level of detail for any given topic, this exercise still ran into the more general limits of the model’s parametric knowledge.

It feels as if the model has internalized an average level of detail from across all its training data. So long as the required response is above that level of detail, the model is able to draw on its knowledge across multiple sources and build a collage of information that leads to genuinely useful insights. But probing for a finer level of detail triggers the model either to hallucinate or repeat the same high-level responses it has previously given.

Supercut: Some circular conversations

The additional curated articles in the prompted version helped in this case. Adding more detailed articles to the system prompt seems to shift the average level of detail, and gives the model more material to work with.

Conclusion

It may not be entirely obvious from the recordings, but I got to some modest ah-ha moments (for example, in understanding the way we control the intensity of the light to ensure there is only one electron at a time) and understood key aspects of the double-slit experiment that I haven’t previously been able to find in reading articles or engaging with text-based chatbots.

Humans talk to each other to share ideas and develop their thinking all the time, so it’s natural to imagine that talking to an AI can be a great way to learn something new, or build on our own inchoate ideas.

Based on my experiences, I believe this to be true, but it rests on the ability of a voice-based AI to create the illusion of a real intelligence in the user’s mind, and it’s very easy for that illusion to be broken.

As voice-powered AI assistants become more ubiquitous, I think their usefulness as thinking and brainstorming partners will become more and more apparent. But realizing that power speaks to the value of continuing to develop and enhance methodologies and benchmarks that are specifically aligned with the nuanced, multi-step back and forth typical of voice interaction.

--

--

People + AI Research @ Google
People + AI Research

People + AI Research (PAIR) is a multidisciplinary team at Google that explores the human side of AI.