Artificial Intelligence

An AI Avatar Walks Into A Talkshow

Integrating an LLM in Live Prime Time Television

Published in

BR Next

8 min readAug 13, 2024

Integrating AI systems into journalistic work and television production is becoming an increasingly powerful way to showcase current state of the art technology to viewers. Here’s how we brought a responsive AI avatar on live television at German Public Broadcaster Bayerischer Rundfunk.

The avatar answered questions by editor-in-chief Christian Nitsche and other guests alike (left: Prof. Alena Buyx).

The rapid development of AI technology is outpacing the ability of everyday people to keep up. Yet, experts agree that AI will impact almost everyone, directly or indirectly. As a public broadcaster, it’s our mission to report on these challenges and raise awareness about how AI could change lives.

Last year, the editorial team of the talk show Münchner Runde introduced the humanoid robot ‘Pepper’ to their viewers. This year, to illustrate the accelerated pace of AI development and showcase the diverse capabilities of AI systems, the team decided to bring in an AI avatar as a guest.

The avatar was supposed to be present within the studio and debate the topic of the show with the host and other guests. What started as an exciting idea quickly revealed itself to be a complex and demanding task.

Münchner Runde: Talk mit Avatar · Raubt uns KI die Jobs?

Topic of the show: How is Artificial Intelligence Transforming the Workplace?

www.ardmediathek.de

Münchner Runde is the weekly political talk show of Bayerischer Rundfunk, the Bavarian Public Broadcaster, which is part of the German Public Service Broadcasting network ARD. Every week editor-in-chief Christian Nitsche and his guests debate the most controversial issues of the time. For our avatar, the focus was on the implications of AI for the workplace.

An Ethics First Approach

Ethically putting this AI model into practice was at the forefront of our minds as we embarked on this experimental project, considering we implemented the avatar for public service television.

Designed to enhance AI literacy among viewers rather than replace talk show guests or the presenter, the avatar required us to adhere to rigorous ethical standards, ensuring non-discrimination, maintaining editorial control, and aligning the avatar output with democratic values and journalistic principles.

We had to find answers to fundamental questions: What should the avatar look like? How can we promote diversity and avoid reinforcing gender stereotypes and ageism?

Moreover, we had to decide whether to give the avatar a name, weighing the potential for anthropomorphising the avatar and giving the impression of a real human guest.

Ultimately, adhering to our standards, guidelines and principles meant not humanising the avatar and clearly marking it as AI-generated (see our BR AI Ethics Guidelines).

We opted to frequently show a lower third marking the avatar as AI-generated, when generated answers were shown (transl.: Avatar — Operated with Artificial Intelligence).

Handling factually incorrect hallucinations of the LLM (Large Language Model) behind the avatar is a critical challenge that requires careful consideration, as it pertains to the core journalistic standards of our public service institution.

Thus, the editorial team opted to make hallucinations part of the debate by asking current event questions to the non-internet connected LLM, for which it gave plausible-sounding but hallucinated answers.

Besides, we briefed the presenter on the high chance for hallucinations. We argue that demonstrating these shortfalls of current AI chatbots illuminates the issues inherent in this technology, thereby educating a wider public.

Implementing a (Very) Special Use Case

Deep fakes and avatar imagery are everywhere, but we needed a highly specialised and secure product — a live, interactive AI avatar that could be used for up to an hour at a time.

This turned out to be a challenging task. After reaching out to avatar companies, we faced various challenges, including legal hurdles like copyright issues and the inability of many providers to deliver a product fit for live broadcast.

Companies that provide AI avatars mainly focus on customer service applications and marketing videos, some of which have already started showing up in social media ads.

Since the industry has mainly developed live and streaming solutions for large enterprises, prices can quickly reach in to the tens of thousands of Euros. This was far beyond our budget, and after extensive research, we still couldn’t find a comprehensive solution that met our needs.

In the end, we decided for HeyGen. They offer an avatar Streaming API, albeit without the integrated voice recording and LLM responses. However, they only provide a demo version with the lower-tiered user accounts. Consequently, we decided to develop the remaining components on our own.

How We Did It

Our main goal was to deliver a low-latency conversation experience with the avatar. Given that we didn’t have a ready-made solution, one of our major challenges was figuring out how the host and guests and avatar would interact.

Drawing from their experience with the humanoid robot ‘Pepper’ on a previous show, the editorial team were aware that addressing the robot had been a major issue, as it often failed to react.

We decided on a hands-on approach: manually starting and stopping the recording of questions and statements for the avatar to respond to. This method, we believed, would allow us to be the most agile and responsive in real-time.

However, the approach had downsides: Anticipating when the presenter or guests would ask a question (in a live program) was difficult and required practice to press the record button on time.

Our pipeline was structured to record the question from the host or a guest in a temporary audio file and use OpenAI’s WHISPER model to transcribe speech-to-text.

The text was then sent to OpenAI’s GPT-4o to answer the transcribed question — at the time of development, GPT-4o did not yet publicly enable audio input, which would have decreased latency.

In case of outages, we used AssemblyAI’s transcription model and LLM LEMUR as a backup. The models were setup with a context-aware system prompt. Finally, the LLM answer was then sent to HeyGen’s Streaming API and articulated by the avatar on the screen.

The hardest challenge was error handling to stabilise the server connection to the HeyGen Streaming API, as it wasn’t built for this purpose. We found elaborate workaround methods and developed error handling to keep the connection alive and circumvent the automatic disconnect after 5 minutes by sending requests and automated reconnects.

Nevertheless, the connection stability was beyond our control.

Live from the control room: broadcast technicians and staff were highly involved in bringing the avatar on television screens. (Photo: BR/Reinhard Weber)

For behind-the-scenes footage, be sure to watch this Abendschau TV-report (in German) about our work.

What We Learned

Expectation management and open communication is key

Glossy videos such as OpenAI’s recent release promotions for GPT-4o suggest a state of the art that raises high expectations with colleagues and viewers alike.

These videos often serve as promotional content for tools, but it is difficult to verify whether the technology truly delivers on its promises.

Thus, it’s pertinent to maintain open and honest communication among all partners regarding their expectations and what is achievable in the short time available.

We emphasised the various issues with stability of the pipeline and streaming connection — due to multiple APIs calls per iteration — for a live TV broadcast.

Additionally, the studio’s technical infrastructure had to be upgraded to meet the project’s exceptionally high demands. A stable (wired) internet connection was crucial; hence, a port had to be set up and an ethernet cable was as laid to the production control room as the primary source, with Starlink satellite internet connection providing a reliable backup.

While this dual approach ensured continuous connectivity, it required extensive planning and coordination with network and broadcast studio technicians.

Moreover, our manual recording implementation required proper briefing of the host and all guests. It was crucial to make clear how they should address a question to the avatar, enabling them to include the AI at any point of the show and allowing us to start the recording in time.

Our highly specialised use case

Avatar providers are not equipped to handle this specialised use case for television, as we found in our numerous inquiries. This underscores the early stage of AI (avatar) assistant development, with many applications still narrowly defined and specialised.

A key takeaway from our experience is that no AI company could provide a live, photorealistic, stable, and fluent conversational avatar, highlighting a gap between public perception and the industry standard in this space.

Recording the questions manually is not feasible for most use cases. However, it was necessary to reduce latency and make the conversation possible.

From our experience, LLMs cannot — as of today — follow several people’s conversations all the time and participate seamlessly (except for the unavailable GPT-4o).

A direct question directed to the avatar was required to properly integrate the avatar into the discussion from a technical and a dramaturgical perspective.

Error handling, backups and redundancy systems are key

Reliability was one the main goals of our avatar implementation for this live television broadcast. To that end, we heavily invested in effective error handling for all APIs connections and the HeyGen Streaming API.

This involved robust protocols to detect and resolve errors such as API call timeouts, server connection issues and local studio-equipment related failures promptly, minimising downtime and disruptions.

From the first minutes of the show, it was clear that we were correct in our assessment, as server connection issues emerged at the start of the program. However, the ongoing problems, which partially interrupted the show, indicate that further efforts are necessary to achieve a reliable system.

The implementation of a backup image loop in the control room added an extra layer of security in case of major errors. This redundancy ensured that even in case of primary system failures, which did occur, the broadcast continued almost seamlessly, underscoring the importance of comprehensive error handling and backup strategies.

Ethics by design and human oversight played a decisive role in the implementation of the avatar into Münchner Runde and will likely continue to do so for journalistic use cases in the future. We aimed to showcase (somewhat) state-of-the-art technology and balance providing information and highlight opportunities and risks without being too promotional or demonising about this exciting technology.

Please direct questions about the project to luca.zug@br.de or verena.steinacher@br.de.

Who We Are

This project was developed by a team within the AI+Automation Lab of Bavarian Broadcasting, a German Public Service Broadcaster in the ARD Network. We aim to combine the work of journalists, software developers, machine learning experts and product designers. The team produces automated texts, graphics (such as avatars) and audio news briefings and joins investigations with statistical knowledge and machine learning skills. We closely work with our data and investigative journalism teams BR Data & Recherche on investigations and product development (here’s how we’re aiming to work together). We’re looking at both sides of AI and automation, asking: How can this technology be useful for journalism? How is it used in harmful ways that should be investigated and discussed by society?

Project Work: Luca Zug, Verena Steinacher, Uli Köppen
Editorial Team: Sebastian Kemnitzer, Silvia Renauer, Manuel Mehlhorn, Reinhard Weber
Broadcast Production: Frank Sommer, Tanja Schröder, Andreas Feyrer, Boris Gubeljic