Opportunities for B2B Software Startups in Voice Tech

Published in

Point Nine Land

7 min readNov 16, 2017

This post is the second part of our Voice Tech Series (Part 1, Part 3 — Voice Tech Landscape:150+ Startups Mapped and Analysed)

In the previous post of this Voice series, I identified the major improvements in speech recognition systems over the last decades and the development of the voice technology infrastructure as the two main drivers for voice applications to emerge.

In this article, I would like to share three main opportunities I see for B2B software startups (our focus at Point Nine) in the voice space.

I) Build a vertical voice application

It is probably not the best startup idea to come up with a generic speech recognition engine and try to compete with the big guys who have collected massive amounts of real-life situational data already, and who have the economies of scale to drive price down. However, a startup specializing in a specific vertical or domain which collects specific data around these can indeed achieve better results than the generic engines of Google, Amazon, etc.

A good example is Chorus.ai, a piece of software which provides conversation intelligence for sales teams, and is building a sales-specific speech recognition system. Micha, co-founder at Chorus.ai, said that empirically, given high-quality, domain-specific data, it is possible to achieve at least 15% improvement over generic engines!

A domain-specific speech recognition engine can give an edge at the beginning. However, this is probably not enough to build long term defensibility as speech recognition technical challenges are more or less solved already and Amazon & Co will get more data, expanding to more and more use cases over time.

As Sarah Guo from Greylock pointed out here, the technical challenges that come with NLU are far from solved today. Building a solid NLU model is a good way to build additional moats and starting with a narrow domain can help to reduce complexity and achieve good results.

Next to that, having a unique access to data, building a sticky product with a long product roadmap and having network effects are also factors that contribute to long term defensibility.

I tried to summarize below the main blocks behind a voice application and the potential sources of defensibility in the short, mid and long term for each of these blocks:

The key for success here will be about identifying a domain and a problem to solve

i. for which a voice application makes sense…

Alon Bonder from Venrock explains this very well in this post by listing four of voice’s native advantages and potential applications for startups. I listed three of these below and added an example of a startup for each:

In addition to finding a use case where voice is better than other interfaces, designing a great user experience around it is critical. For example, interrupting a conversation to invoke some commands or having to say private things out loud can lead to an awkward situation that you want to avoid. Here are some voice design best practices published by Amazon and Google.

ii. …that is not solved by others yet…

Domains like customer support and sales seem pretty crowded already, so it probably makes sense to go beyond the “obvious opportunities”.

The next post of this Series will be a landscape overview and will hopefully help to identify some of the gaps in the voice space.

iii….for which you have a unique access to a high volume of qualitative domain-specific data

As mentioned in my previous post, for the speech recognition model, it is about getting domain-specific, spoken audio recorded in real-life situations.

As for the NLU part, success comes from getting transcripts of spoken conversations. It is worth noting that speech data is very different from written text on which most existing NLU models are based (newspapers, manuals, books, etc. ). In fact, spoken language is much more spontaneous and errors, interruptions, hesitations, etc. are represented in the transcript. Spoken data may also contain more orthographic errors for similar-sounding words (eg. two and too) and is much more unstructured as there is no punctuation, blanks, capitalisation, etc. And these are only few examples of the additional challenges posed by spoken data.

II) Build a B2B2C voice application leveraging the voice-first platforms

Over 60.5 million people — 18.5 percent of the US population — will use voice-activated assistants like Amazon’s Alexa, Apple’s Siri, Google Assistant, Microsoft’s Cortana, and Samsung’s Bixby, with one-third of US internet users speaking to voice assistants by 2019 (75.5 million people) Source

Customer facing companies could leverage these growing platforms as a cheap distribution channel, as Zynga did with Facebook. The B2B opportunity here is to offer a solution leveraging the platforms for a part of their product to B2C companies who could benefit from it.

A good example of a startup is Cardiocube, which provides a heart disease patient home treatment support tool to hospitals. They collect data and follow patients at home by asking them some questions via Amazon echo or another voice-first device, and present the medical data in a web-based interface to the doctors. Instead of having to distribute hardware to every patient and educate them, they can just leverage whatever the patient has at home and is familiar with already.

Building an application on top of Amazon Alexa or Google Home may limit your application capabilities. For example, you cannot get raw audio or text from your Alexa skill which makes it hard to build custom speech recognition or NLP as mentioned in the previous section.

But, if the output provided by the platforms is good enough to deliver a good experience, having massive reach from day 1 can help you to gain a sustainable advantage by being the first mover. Also, the flexibility provided by platforms will probably change over time.

III) Build a third party service for the voice ecosystems

As more and more dedicated voice-first hardware devices are entering our lives, and platforms are being built around them (Amazon Alexa, Google Home, etc.), the third opportunity I see for B2B software startups in the voice space is to build a third party service to power these new ecosystems. These services are expected to be around analytics, distribution, advertising and marketing.

VoiceLabs, which provides voice experience analytics for Amazon Alexa and Google Home developers, Jovo that allows developers to build cross-platforms apps for voice or Storyline that let you create voice apps without coding are examples of startups tackling this opportunity.

However, as the space is very early, it is key to understand what services;

i. have enough demand today

Being too early can be fatal as one cannot afford to wait years for demand to come. Therefore,

it is important to determine if a service is at the starting point of an explosive growth or not.

One of the key questions to answer is the following: How many potential customers have a clear ROI using the service today and how much are they ready to pay for it?

ii. will still make sense when the space will mature.

Voice is still pretty new for most organisations and most of them don’t have the budget, time or technical skills to develop voice applications in-house yet. As a consequence, they may use a third party service as a first step to test some ideas and move things in-house once the ROI is clear. It is therefore important to understand if the service allows them to do things faster, cheaper and without losing any competitive advantage.

The question here is: Do the customers who have a clear ROI using the service today will have the same ROI tomorrow?

iii. will not be owned by the big platforms

Amazon and Google now have tremendous power in the voice space and it is critical to identify what the big platforms are willing to own or open to third party providers. For example, Voice Labs had to shut down their ad network after Amazon updated its advertising policy on May 21.

Another interesting way to look at this which my colleague Rodrigo pointed out is to see what happened with previous platforms (Facebook, App Stores, etc.) and who made it big there. Here are two examples:

Analytics: App Annie that was one of the early player capturing data apps on the App Stores. Being early in the game allowed them to build good defensibility as historical data is super valuable.
Distribution: Aptoide that is now the largest independent Android App Store allowing users and companies to setup and manage their own App Store.

Leveraging the voice ecosystem probably involves much more uncertainty than building a vertical voice application. I therefore believe that most VCs (including us) will feel more comfortable about investing in the latter.

This is obviously not an extensive list of interesting opportunities for B2B software startups in the voice space and I am sure many smart people identified other great opportunities.

If you are an early stage startup in this space looking for funding get in touch here.

The next article of this Voice Series is an attempt to get a good overview of the current B2B Voice software ecosystem.

Many thanks to Louis Coppey, Clement Vouillon and Rodrigo Martinez for reviewing this post and their super valuable input.

Opportunities for B2B Software Startups in Voice Tech

Written by Savina van der Straten