The Network Effect of Voice

In today’s voice ecosystem, the large platform players of Amazon, Google, Apple, and Baidu are controlling the entire voice-stack, all the way from the hardware to the software. How, if ever, are startups going to be able to compete?

Published in

Startup Grind

7 min readJul 9, 2017

Having researched this space and mapped the voice market landscape, I’d like to share my thoughts about where the greatest value creation opportunities lie for voice-positioned startups.

In short, I believe that companies who focus on disrupting one or more of the three key areas of data, software, and automation within the “computational speech” value chain will thrive in this competitive ecosystem.

What is the “computational speech” value chain?

If you’ve seen my market map, you’ll see that this value chain closely maps to the “platforms / core technologies stack” there. This horizontal layer enables voice computing across a vast array of problem types.

The idea is to find players who are disrupting this value chain by focusing on one or more of those three key areas — data, software, and automation. The most influential combination of the three creates a positive feedback loop.

The collection of data leads to improvements in software, leading to the automation of the learning process. Thus, systems will continue to learn and improve from experience.

By identifying disruptive feedback loops within the speech value chain, we find voice applications with potential for big businesses. With this in mind, let’s look at why data, software, and automation are so key to AI startups positioned for voice.

Data | Access to unique or proprietary datasets

Time and time again, we’ve heard that “data is the new oil.”

As the AI talent pool grows, and new tools enable people to engage with the technology, access to unique and proprietary datasets become key to building differentiated products.

According to InsideBigData, the size of the digital universe will double every two years, a 50-fold growth from 2010 to 2020.

Human and sensor (i.e., machine-generated) data combined is experiencing an overall 10x faster growth rate than traditional business data, with sensor data increasing even more rapidly at 50x the business data growth rate.

InsideBigData, The Exponential Growth of Data, February 16, 2017

It’s worth noting that the benefit of data may become saturated, as the learning capacities of the practically deployable models are limited. For this reason, application-related data can be more valuable than generic data.

If you consider the big problem of converting a random audio to a random text, it’s very hard for a startup to compete, because it requires a lot of data and a lot of compute power and engineering resources.

To compete against the big tech companies, startups must figure a limited domain to be their beachhead. They must find solutions to problems that do not require a lot of data and computing resources, in industries that have been so far overlooked by the tech giants.

Take Digital Genius. This company is automating conversations between big brands and their consumers using artificial intelligence (AI) and natural language processing (NLP).

Working with numerous Fortune 500 customers as clients, Digital Genius has exclusive access to millions of historical chat records. These records help train their deep learning algorithms, learning the most appropriate responses and reducing the training period before having to interact with new, live customers.

Having access to a unique dataset and leveraging AI and deep learning techniques enables Digital Genius to enter a niche where the likes of Amazon and Google are not yet investing; working smarter not harder.

Software | Differentiated approaches to machine learning

To make sense of the data, you need to have the right software tools.

Advances in deep learning have allowed people to find ways to work on unlabeled data and use unsupervised learning to figure out a topology to get a basic structure (i.e., learning features automatically).

You can then use traditional back propagation, where you propagate errors back through your weightings to classify and predict what you have labeled.

(Lee, Largman, Pham & Ng, NIPS 2009) (Lee, Grosse, Ranganath & Ng, ICML 2009)

Startups that find new ways to undermine the competitive value of large datasets through innovative machine learning concepts will provide value in this market.

For instance, a company in Paris called Snips.AI is using a method called training set synthesis (TSS) to generate artificial training data to train machine learning models, instead of relying on real user data.

Similar to GANs, which generates synthetic datasets when you don’t have enough primary data, Snips.AI is showing that these artificial training datasets can be used to train other discriminative models.

This approach has potential in applications where audio data is scarce (e.g., in specific domains like healthcare).

Where user flexibility is desired (e.g., by enabling specific keywords).

Where enhancing the dataset would improve performance.

The problem is challenging, though, as the generated datasets must be high quality and diverse.

Automation | Continuous incremental improvement

The success of an AI solution depends on continuous learning and tuning of the AI model.

Various AI related technologies such as natural language processing (NLP) and speech recognition have substantially progressed over the years to coalesce into systems that learn-unlearn-relearn.

Especially for voice interfaces, user personalization is key. Each person has a unique voice with individual characteristics.

Identifying a user by their voice enables a company to provide a personalized experience by knowing the users frequently used services, the music they prefer, their past purchases, and more.

Personalization requires technologies that learn these individual behaviors. As a result, startups working on true personalization will need to continuously learn from past and present behavioral data.

Not only does behavior differ among individuals, but individual behavior changes over time. Systems that rely only on large sets of historical data cannot effectively characterize these changes.

The success of an AI solution largely depends on the continuous learning that follows every voice interaction.

I can imagine a system that combines machine learning with human operators, ensuring the availability and accuracy of information is fed into the AI-based solution.

For example, Seattle-based definedCrowd is automating and accelerating enterprise data training and modeling with this blend of human and machine, combining crowdsourcing and machine learning for speech and NLP technology.

With people in 53 countries across 42 languages, they are able to help data scientists and developers build natural language applications faster and with higher quality, without requiring deep knowledge of speech science or natural language understanding.

A true AI system needs to be able to learn continuously. It needs to combine high quality training data with state-of-the-art machine learning algorithms and automation in order to deliver on the enormous potential of personalization, faster and at scale.

Holding an Advantage Over the Tech Giants

Nimble startups working on detecting and producing vast amounts of data, using AI techniques to represent data as knowledge, or automating knowledge discovery and data mining along the speech value chain, will hold an advantage over the tech giants.

In theory, finding this positive feedback loop in any AI niche presents an opportunity for a startup to thrive. Machine learning enabled companies connect product and data at the hip.

The main driver of quality in a machine learning enabled product is the accuracy of the underlying algorithms that form any given task. And that largely comes back to the quantity, quality, and uniqueness of the data on which those algorithms have been trained.

This is what people describe as data network effects.

However, because I’ve been so interested in the voice enabled ecosystem, I’m applying this value proposition to “computational speech” first.

In my next post, I’ll primarily pick apart the first step of the value chain (i.e., voice pickup) and highlight some areas that I’m excited about.

In particular, I’m watching startups disrupting this domain, powered by proprietary data, machine learning models, and continuous learning.

If you’re building a company focused on one of these three key areas within the industry, I’d love to chat.