Virtual Agents Going Sentient?

First it was text input and ChatGPT 3.5 gave wonderful results.

Pratibha
Operations Research Bit
3 min readMay 15, 2024

--

Photo by cottonbro studio

Fast forward 2024, we witness the sophisticated world of advanced virtual assistants whose potential applications and functionalities are nothing less than mind-boggling. Voice intonation, quick responses — even when interrupted — and their ability to understand emotions, make them seem more human by the day.

Google’s Leap

Project Astra, Google’s AI assistant has advanced seeing and talking responsiveness.

When run on a Pixel phone and on a glasses device, in a continuous take, the assistant recognized objects and described them flawlessly. Even crazier — it came up with the phrase “Schrödinger’s cat” based on a visual clue of two cats and a box. That level of deduction is simply amazing.

The awesomeness does not stop there. It can also remember what it sees. Watch it actually describe the accurate location of the glasses it had scanned before. Even better — it can identify code, suggest code solutions, explain code and suggest names for a dog. Input can be voice, audio and text.

Soon to be available with advanced voice command functionality and integrated with Google’s suite of applications, the new AI assistant also sounds more natural. Users can choose between different voices.

In Google Deepmind CEO Hassabi’s words, “To be truly useful, an agent needs to understand and respond to the complex and dynamic world just like people do — and take in and remember what it sees and hears to understand context and take action. It also needs to be proactive, teachable and personal, so users can talk to it naturally and without lag or delay.”

Now that’s almost the “sentient” level. Except that it is trained and cannot feel naturally. This fact does feel relieving in some way.

Screenshot by Author

GPT-4o can Laugh

OpenAI’s newest flagship model is revolutionizing user interactions with its GPT-4o. With text, audio and video input, responses are intuitive and seamless.

This virtual agent can laugh at your jokes, sing in bass and soprano for you and even give its take on your appearance and suggest how you should present yourself for an interview.

It can instantly translate about 50 languages real time and sound natural in all of them. Detecting, analyzing and understanding human emotions and providing real time coding assistance is a breeze for this agent.

One might wonder if the agent has opinions on political parties or comments on war. However, OpenAI has programmed these agents to steer clear of topics related to politics, malware creation or making direct comments on ongoing wars. Which means, it will only reply based on information fed.

The Future

While these human-like interactive technologies show great promise in making life more and more easy for humans and enabling better solutions to world’s bigger problems, they also raise concerns such as ethical and regulatory challenges; impact on unemployment and privacy concerns.

Hope we will leverage them to help come up with thoughtful regulations and frameworks to ensure the well being of society.

Uncle Ben would have said, “With unprecedented technology comes huge responsibility and accountability.”

Accountability. That is the question. May be the answer too.

--

--