Converting text news into video stories using Deep Learning

Published in

The Startup

4 min readJun 18, 2020

Recent advances in Deep Learning, a specialized branch of Artificial Intelligence, are driving the next wave in synthetic media generation.

Extracted from Wikipedia:
Synthetic media as a process of automated art dates back to the automata of ancient Greek civilization, where inventors such as Daedalus and Hero of Alexandria designed machines capable of writing text, generating sounds, and playing music.

In recent years, “synthetic media” has become a general term used to describe video, image, text, and voice that computers generate. With these advances, we are about to see a major paradigm shift in media creation.

Companies like Rosebud AI and Humen are disrupting the multimedia creation space by synthesizing videos and images, potentially saving creative agencies and studios millions of dollars in asset creation.

Rosebud.AI

Rosebud AI is a synthetic media company based in San Francisco, that allows users to choose or select the skin color and ethnicity of the person in the image.

Another AI face generator? Yes, another one. But this time, users are able to upload any face into a system that places it onto another person’s stock-image body.

Check out their product demo, https://www.youtube.com/watch?v=wt5Y6L6ebR4

Humen

Humen is an AI for personalized interactive content creation.

Transferring dance moves from the video “That’s what I like” by Bruno Mars

Humen, also located in San Francisco, allows anyone to create mass-market entertainment, social storytelling, and high-quality graphics rendering, without the need for hundreds of professionals and a budget of millions of dollars.

What these companies have in common is that they both use cutting-edge deep learning technologies such as generative adversarial networks, facial point detection, body pose estimation, and cloud technologies to synthesize human faces and videos that cannot be distinguished from reality.

Today, I want to show and deconstruct a new product in the synthetic media category that we have been developing for the last few months called NIUS.TV.

What is NIUS.TV?

Catching up on news on mobile is still painful. Reading while commuting, exercising, or waiting is difficult.

We can do better!

NIUS.TV is a next-generation mobile-first news aggregator that converts text news on topics you love into video stories narrated by an AI anchor.

Our short-video stories (typically 30–40 seconds) are perfect for those in-between moments we all have during the day.

We want to improve the news consumption experience so that people can catch up on what’s happening quickly, without the noise generated by social networks.

We believe in building innovative technology that feels familiar, natural, and human.

NIUS.TV is free and ad-free.

How can I join?

After registering on the NIUS.TV website, you will receive short stories from Monday to Friday about technology, science and space every morning in your email.

Watch NIUS.TV while waiting for public transportation

You can also find us on Twitter at:

We will be launching our mobile app soon. Stay tuned!

What’s next?

So far we have produced over a hundred video stories using our end-to-end system.

We are just getting started and we are very excited about the journey ahead.

In these last months, we created all the necessary tools, systems, and product details. Unfortunately, this post is too short to accommodate all the pain-points, the lessons learned, and the details on each of the steps that make up the video generation pipeline.

Deep learning is a fast-moving target, and we believe our architecture is flexible enough to accommodate new advancements.

I look forward to writing a follow-up post about our infrastructure for training deep learning models.