String

Discovery and tech adoption for public officers, starting with Education Officers

Member-only story

Technical Architecture for Digital Avatar Conversational Systems

AI Talking Avatars Day 2: Linly-Talker, CosyVoice2 and bonus DanteAI (not-talking-head)

Kahhow
4 min readApr 8, 2025

--

Image courtesy of https://github.com/Kedreamix/Linly-Talker

Taking a deep dive into business use cases and proposed technical architecture for AI talking avatars, namelyexplore Linly-Talker and CosyVoice2. and will trial DanteAI

I must confess, today’s series of talking head videos made me fairly uncomfortable:

Made using Linly Talker, original video here hosted on Bibly in Mandarin

The design philosophy of Linly-Talker:

create a new form of human-computer interaction that goes beyond simple Q&A. By integrating advanced technologies, it offers an intelligent digital human capable of understanding, responding to, and simulating human communication.

Breaking down talking heads, we have:

  1. Real-time interaction, which includes accepting user input (voice); real-time speech recognition and video captioning
  2. Parsing/ transcribing, with memory to accommodate multi-turn dialogue
  3. Rendering the mouth/ facial expressions + sync with audio output via libraries like Wav2Lip/Wav2Lipv2 /…

--

--

String
String

Published in String

Discovery and tech adoption for public officers, starting with Education Officers

Kahhow
Kahhow

Written by Kahhow

Educator interested in data science, dance and full stack development

No responses yet