Tracer Newsletter #41 (20/01/20)-Carnegie Mellon researchers publish a new technique for transforming a live speaker’s voice into the style of another person

Henry Ajder
Sensity
Published in
5 min readJan 20, 2020
20/01/2020

Welcome to Tracer, the newsletter tracking the key developments surrounding deepfakes/synthetic media, disinformation, and emerging cybersecurity threats.

Want to receive new editions of Tracer direct to your inbox? Subscribe via email here!

Carnegie Mellon researchers publish a new technique for transforming a live speaker’s voice into the style of another person

Researchers from Carnegie Mellon and Peking University introduced a new “any to many audiovisual synthesis technique” for transforming a speaker’s voice into a potentially infinite number of other styles.

How does it work?

The paper introduces a method of synthesizing voice audio by training a model to convert any spoken audio input into an output modelled on the style of a distinct speaker. This transformed speech can be accompanied by a synchronised “talking head” style video of the target speaker in question. In effect, the researchers claim the technique makes it possible for anyone to speak into a microphone and have their words synthetically “spoken” by a potentially infinite number of people. The paper’s approach is surprisingly simple, based on training an auto-encoder on an audiovisual stream of a target speaker whose voice you wish to replicate. From here, arbitrary live voice audio is then passed through this auto-encoder, with the synthetic voice and video of the target speaker being outputted.

Why is this research important?

Previous research translating audio from one speaker to another typically required training data of both the original speaker and intended target for realistic results. This technique removes the requirement for any original speaker training data, with the models requiring training only on data of the target speakers. This removes a significant amount of processing for developing models for new targets and means that trained models can be easily shared and used widely without modification.

SenseTime researchers present a new approach for generating realistic “audio to video translation”

Researchers from the Chinese AI company SenseTime proposed a new technique for realistically generating a video whose mouth movements are synchronised to match a given audio file.

How does it work?

The technique aims to deliver the challenging process of “many to many” audio to video translation, where the translation technique does not assume a single identity for the source video and targeted video. To achieve this translation, the researchers mapped “expression parameters” from the subject’s facial features present in the video prior to training, with expressions providing a strong semantic basis for accurately translating audio to the corresponding lip movements in video. This map is then combined with geometry and pose parameters derived from the target person to create an accurate 3d face mesh, but with lip movements that match the given spoken audio. The process is identity agnostic in relation to source audio, making the technique robust when dealing with variations in different peoples’ voices.

Why is this research important?

SenseTime’s researchers argue their technique presents a notable improvement over similar techniques that synthetically manipulate the facial features of images, due to its improved ability in handling previously unseen media. The generated outputs’ realism is also impressive, with a further study conducted by the researchers finding that 100 volunteers identified the synthetic videos as real 55% of the time, compared to 70.1% of the time with authentic audio-visual recordings.

This week’s developments

1) A Pakistani environmental advocacy group launched a campaign that uses deepfakes to depict various world leaders in the year 2032 apologising for their inaction on climate change. (Apologia)

2) University of Rochester and Viscovery researchers released TailorGAN, a new generative technique for photo-realistically transferring design attributes from one piece of clothing to another. (arXiv)

3) Ukrainian company RefaceAI released Doublicat, an Android and iOS app that allows users to “faceswap” themselves into popular gifs and memes based on a single selfie. (Doublicat)

4) Baidu researchers published AdvBox, an open-source “robustness testing” toolbox for generating

adversarial examples that trick neural networks commonly found in popular AI frameworks. (Github)

5) Digital artist Shardcore created Celebreedy, a social media bot that uses StyleGAN to publish synthetically blended images of two well-known celebrities’ faces. (Shardcore)

6) Duke University researchers published a draft of a proposed standard for fact-checking visual media that provides journalists with a common, machine-digestible format for laying out findings. (Nieman Lab)

7) Instagram reversed a fact checker’s decision to label a digital artist’s photoshopped image as false after accusations that the move represented censoring art on the platform. (Daily Beast)

Opinions and analysis

A comparison of Twitter and Facebook’s deepfake policies

Lindsey Gorman and Amber Frankland present a comparative analysis of Twitter and Facebook’s new deepfake policies, including key takeaways on each policy’s scope and efficacy.

Why altered videos of politicians will keep going viral in 2020

Drew Harwell outlines the recent history of “shallowfake” manipulated media in US politics and argues that the emotional potency of this form of disinformation will ensure its continued usage in 2020.

Deepfake pornography harms adult performers, too

Lux Apltraum reports on the “eviscerating” impact non-consensual deepfake pornography has on adult performers whose bodies are weaponised against other women.

Setting the record straight on Samsung’s “Neon” artificial humans

Shara Tibken interviews the CEO of Samsung’s “Next-generation AI chatbot” company Neon, and answers key questions about the synthetic avatars’ functionality, limitations, and future implementation.

Want to receive new editions of Tracer direct to your inbox? Subscribe via email here!

Working on something interesting in the Tracer space? Let us know at info@deeptracelabs.com

To learn more about Deeptrace’s technology and research, check out our website

--

--