Voice Tech Landscape: 150+ Infrastructure, Horizontal and Vertical Startups Mapped and Analysed

Savina van der Straten
Point Nine Land
Published in
8 min readDec 13, 2017

This post is the third part of our Voice Tech Series (Part 1 and Part 2).

In an attempt to draw an accurate picture of the current B2B Voice software ecosystem, I listed and mapped 150+ B2B Voice Tech Software startups below.

This list is a starting point and probably far from exhaustive. If you spot companies that are missing or miscategorised, feel free to contact us. If your Voice Tech company is not on the map, please fill this form and we’ll add you. The approach and data set behind the landscape diagram is detailed in the last section of this post.

1. Horizontal Voice Tech Applications

The most popular Horizontal Voice Application categories are Productivity (41.1%), Customer Support (17.8%), Business Intelligence (13.7%) and Sales (11%).

Looking at the average and median funding, excluding public companies for the most popular categories, Customer Support and Business Intelligence (BI) are the categories with the highest average amount raised.

Voice is and will remain a key channel in customer service

It is no surprise that Customer Support is one of the leading categories as voice still is and will most likely remain a key component in customer service. Interestingly, the two companies that raised more than $100M in this category (Afiniti and Interactions) were founded in the early 2000’s when speech recognition technology was still suboptimal. This probably shows that early Customer Support voice products already had an obvious ROI (faster response time to customers and lower workforce costs) even with limited capabilities such as basic calls qualification or routing. New players like our portfolio company CallDesk are leveraging the latest advances in speech recognition, intent recognition and dialog management to categorise repetitive calls with great accuracy and handle them accordingly.

Tools that unlock intelligence from massive call data create value even if basic

The two biggest companies in Business Intelligence (Nice and Verint that are both public) were founded before 2000. These tools allow their clients to unlock business intelligence from thousands of hours of calls that would probably not be analysed otherwise. Even with imperfect accuracy, they provided value from the start. In addition, their use cases don’t require speech recognition to be as fast as in other contexts.

Sales and Productivity are emerging categories with most of the companies founded after 2010.

Sales Voice applications needed higher tech maturity but are now entering a race

Sales voice applications emerged later than Business Intelligence and Customer Support applications. I believe this is because the bar to provide significant ROI is set higher and the technology was just not good enough until 5 years ago. Getting insights from customer calls and meetings and coaching sales reps is critical for any business. Relying on a tool instead of a human to perform these tasks requires much more complex tech than what was needed to perform simple routing of incoming calls or to extract basic information from them. There is a massive opportunity here — email optimisation is a multi billion dollar market and sales is still mostly done over the phone. And that’s perhaps why the space is getting quite competitive already with Gong, TalkIQ and Chorus each raising over $10M.

Productivity Voice applications are sprouting up thanks to the democratization of software but $ aren’t following yet

The emergence of many Productivity applications like automatic meeting note tools over the last few years is probably driven by the democratization of the underlying software building blocks. In fact, it is a fairly easy use case to start with today as you can basically bundle several APIs together, let a couple of freelancers polish the API responses and sell the resulting product to the long tail of the market without having to go through complex Enterprise Sales. Having a tool to take notes and summarising meetings/calls automatically is highly valuable. However, the quality of the output of most of these tools is not great at this stage (at least for those with limited manual review) and none of them have a great deal of traction yet as tolerance for error is low. Additionally, no major differentiation is apparent yet and it is hard to pick a winner at this point. There is also some concerns about the long term defensibility of these services as the gradual commoditization of software building blocks will increase competition and drive the prices down. This is reflected in the average and median amounts raised which is relatively low compared to other categories.

2. Vertical Applications in Voice Tech

In terms of Vertical applications, Healthcare is the most popular category (47.1%) by far followed by Finance (14.7%), E-commerce (14.7%) and Manufacturing/Supply chain (14.7%). E-commerce and Finance are the categories with the highest average amount raised when looking at average and median funding.

Compared to Horizontal Applications, the amount raised by Vertical Applications is generally lower and there are no public companies yet. Also, most Vertical Applications emerged after 2010.

Telemedicine and voice-first hardware encourage conversational healthcare applications

The development of conversational Healthcare applications is emerging in sync with telemedicine and the proliferation of voice-first hardware. It is also worth noting that US hospitals have now a strong incentive to provide high quality follow-up care. Since 2013 they can be subjected to penalties for excessive readmission rates.

The relatively low average and median amount raised compared to other categories is probably related to the inherent challenges of this category. In fact, the confidential nature of the data involved, low tolerance for error and long sales cycles make it unusually hard to move fast and hit appealing growth numbers. The rising stars in this category include Sensely and Syllable.

Offline is dead; Banks have to find new ways to engage with their customers

The end of traditional banking relationships at brick-and-mortar branches is probably one of the main drivers for conversational apps in Finance. In fact, acquiring new customers in banking is expensive and retention is key. Banks therefore have to find new ways to nurture their relationships with clients. Conversational apps enabling real-time, human-like communication with customers seems to be the right approach. Some of these include Kasisto, Active AI and Personetics that each raised over $10M.

Legal requirements requiring comprehensive record keeping of customer interactions have likely contributed to the development of transcription applications in Healthcare and Finance such as Kiroku and Voxo.

20 percent of mobile queries are voice searches and E-commerce businesses should become voice search-friendly

The growth of virtual voice assistants such as Siri, Alexa and Google Assistant is changing the way we search. Twenty percent of mobile queries were made via voice in 2016. E-commerce companies therefore now have to offer a decent voice search experience to capitalize on that trend and boost sales.

3. Voice Tech Infrastructure

On the infrastructure side, Software building blocks are clearly more important than Third Party Services for Voice First Ecosystems in terms of number of companies but also in terms of funding. This is not really surprising as voice-first ecosystems are still nascent and, as mentioned in the previous post of the Series, most investors probably feel more comfortable about investing in this category when the space will mature.

Delving into the Software building blocks sub-categories in more detail, the most popular categories are NLP and Conversation AI (39.5%), Speech to Text (21.1%) and Text to Speech (13.2%).

Looking at the average and median funding, excluding public companies for the most popular categories, Speech to Text ranks the lowest.

Already a couple clear winners in Speech to Text, no much room left for startups

As category leaders in Speech to Text are already emerging (Nuance and the core platforms like Google, Amazon & Co), it is not surprising to see less money invested and startups created in this category. It is also hard for a new entrant to compete with the big players who have already collected massive amounts of real-life situational data. New competitors focusing on specific domains might compete on accuracy but the big players will expand to more and more use cases over time.

NLP and Conversational AI is the hot voice infrastructure layer

NLP and Conversational AI form a very interesting category in Voice today. As mentioned before in the previous post in this Series, technical challenges related to NLP and conversational AI are far from being solved. However, these solutions are critical for most voice applications as they enable human-like conversations. Quite some money is invested in this category compared to others. However, there is no clear winner yet with six companies that raised more than $10M each and I believe that the landscape will remain similar for a few more years. In addition, the core platforms have not fallen asleep and acquired a number of companies in this space. Some of these include Kitt.ai (acquired by Baidu last July), Ozlo (acquired by Facebook in July as well), api.ai (acquired by Google in Sept 2016) and Wit.ai (acquired by Facebook in January 2015).

Data Set and Methodology FAQ

Which cloud companies are included in the list?

Included

  • B2B Software companies for which voice is a significant component of their offering

Excluded (category)

  • Skills
  • Bot/NLP startups with no visible voice offering
  • Podcast startups
  • Telecom players and other companies enabling human to human voice or audio interaction only (versus machine interaction) except for CAAS

Excluded (stage)

  • Companies that were founded before 2010 and have less than 40 employees or less than 5M raised as of today
  • Companies founded between 2010 and 2015 with less than 5 employees
  • Companies that were acquired
  • Companies that are not launched yet

How did you build this list?

I sourced most of the companies by

  • exploring Crunchbase and Angel List for several keywords (voice/speech/etc.)
  • looking at our own deal flow
  • monitoring the news
  • checking existing landscapes like this one published by Jerry Lu in June and this one published by David Beisel in January.

Where can I see the list and its data?

Can you tell me a little more about the data available for each startup?

  • Company name
  • URL
  • Category 1: Vertical = B2B software application tailored for a specific industry / Horizontal = industry-agnostic B2B software application /Infrastructure = main building blocks for voice software applications
  • Category 2 and Category 3: Subcategory
  • Founded: founding date of the company checked on Crunchbase and Linkedin
  • Employees: checked on LinkedIn
  • Raised: the amount of money raised mentioned on Crunchbase (if the cell contains “?” it means I have no info)

The categorisation might be subjective in some cases and I’m more than open to discuss changes.

--

--