Let there be voice: 10x Audio Production

Published in

Deepsync Technologies

4 min readAug 24, 2019

Imagine living before the year 1440 when books were scarce and their production was very slow and highly expensive due to the manual labor involved in doing so. People at the time use to painstakingly copy one content of a book to produce a new one, which meant copying diagrams, calligraphy and the content itself in a most precise fashion. A book created before 1440 could take months or even years to get into circulation and was limited to the select few. No wonder there was less diffusion of education and content and forget about mass media. At that time, we lived in an age of quite desperation.

But, then boom! Guttenberg invented a 10x revolution in Printing production, a true Zero to One Invention which could produce High-Quality books at a fraction the time and cost. This was his Printing press. It was built on centuries of incremental and different innovations that he finally brought together. This obviously meant that there was an influx of written content and a wave never seen before, eventually leading to the Enlightenment era where people could share their findings quickly and accurately. Needless to say, everything changed. Some people claim it to be the most important innovation between writing and computers.

But times are changing and with new modes of information, we need new innovations.

Today, a new wave is rising and that is of voice-based content. The purpose is to create High-Quality voice content quickly and accurately. Guttenberg is not here anymore to solve this, but thankfully, Deep Learning is.

Why do we need 10x in Audio Production?

If you are not living under a rock for the last few years, you can possibly see that the nature of the content is shifting. People are looking for High-Quality, personalized, customized and interactive content. Audio is one of the forms that is highly promising for such content. It’s not by any standard new but has gained enormous popularity in the last few years, thanks to Audiobooks, Podcasting, and voice-activated assistants. It is getting even more popular in developing countries like India where people are skipping over text to directly have a voice-first interaction. But today, Audio production like the era of 1440 remains highly costly and time taking. This is how it works in a gist:

Voice Artist -> Expensive studio -> Time taking recording -> Post-production / Copyright: Deepsync.co

See what’s slow and inefficient here? Basically everything.

Introducing Deepsync: Augment your voice

Deepsync is an Augmented Intelligence that learns the way you speak. That is correct! It creates a digital model of your voice and learns 100’s of features including your accent to the way you subtly express yourself. It does by using advanced forms of Deep Learning, an approach loosely modeled on our brain.

Once you sync your voice with Deepsync, it becomes an extension to yourself and helps you record content for the 80–90% of your entire work where it is confident that you don’t need to do so!

For the 10–15%, it asks you to record. This can include things like high expressiveness or extreme cases like speaking a sarcastic tone. But don’t worry, it keeps improving itself in the background and would eventually even learn that. Before we go ahead, let’s hear an example:

This is Neil’s original voice before syncing / Copyright: Deepsync.co

This is Neil’s synced voice. The music is added by our AI itself / Copyright: Deepsync.co

As you can hear, the synced voice is a near-perfect replica and produces High-Quality voice at a fraction the time and cost. The best part, there are no background noises which cuts down the post-production time by a great degree. Of course, you are free to download the audio and add your own magic to it.

This mean, if you were to record 1 hour of content normal way, it would take 3–4 hours minimum after post-production to get it effectively right. With Deepsync, it would take you a fraction of that time.

Is this going to take away my job?

What?! No. On the contrary, it is going to free you from the recording part where it is confident to record in your voice and would give you plenty of time to pursue your creativity, either in creating content or editing to add your own magic.

This is the philosophy that differentiates us from other AI companies who are looking to automate you. We are looking to Augment you. We deeply believe that only by creating a Human-AI interface can we effectively go boldly into the new world that awaits us.

Of course, you are welcome to know more: https://deepsync.co and reach out to me in case you are interested to know the potential — ishan@deepsync.co. We are currently operational with companies and would add the support for Individuals in the coming months.

Let there be voice!

Best,

Update: Deepsync named among the top 10 AI startups in India. Link: http://bit.ly/top10AI_india

Let there be voice: 10x Audio Production

Why do we need 10x in Audio Production?

Introducing Deepsync: Augment your voice

Is this going to take away my job?

Written by Ishan Sharma