Why is Video Conferencing so SH*T — PART II
How can we disrupt this Market?
Hopefully you’ve read Part I [link]. In this final part I’ll talk about how I believe this market will be disrupted, and the golden opportunities that have opened up.
Part I recap in a graphical nutshell:
There’s also a strange barrier to entry. Sure you can put a conferencing platform together quickly, say with better UX and features. But its audio and video signal processing performance will be rubbish, and your customers will have a disappointing experience, regardless of how well your screen sharing works. Doing the signal processing well is specialised and requires a lot of investment, and there’s no one to buy that in from. You’ll leave your customers with the same old familiar problems…
Sorry I can hear myself, can you try muting for a second? No I can still hear myself…
We find ourselves in the unbreakable cycle of doom. Too small a market to invest, products don’t get better, and the market won’t grow. But can we break this cycle?
Disrupting This Market
The fundamental trick to breaking this cycle is to break from the vertical market model. What is needed is technology companies, independent of the conferencing platform providers, doing state-of-the-art audio and video signal processing specifically for this application. Their business model, setup to sell to the conferencing platform providers, new and old. Reducing the barrier for new communication companies to enter, enabling new applications, and making the products of the existing players better, and in turn growing the market.
The benefit of this approach is that the ROIs start making sense. If for example you develop a great real time noise cancelling capability, AI powered, highly qualified, you can sell this to all the major conferencing providers. And not just corporate, but also education, healthcare, and even other markets outside of conferencing. Now the market size is big enough to justify the investment. But how to make this transition?
The Cycle is Broken, the Time is Now
The market was evolving, but Covid has been a huge shot in the arm accelerating the transition. At least for now the market size has dramatically increased, opening a lot of opportunities for investment. The cycle is broken.
However I believe the worst thing you could do now is to try and make another general purpose conferencing platform to compete with the existing ones. It won’t be better, you just won’t have pockets deep enough, and even if you did, you’ll never get a reasonable return on it.
I believe a successful permanent transition of this market will require an ecosystem change. We need 4 types of players in this market for it to be healthy:
1 — The familiar companies that exist today, Zoom, Microsoft Teams, Google Hangout, Cisco WebEx, etc. General purpose conferencing platforms.
2 — The biggest growth in the next decade will be here. Tailor made platforms for remote education, or remote healthcare, or exercise platforms, etc. This is already happening, major leader for example a Swedish company Kry offering doctor visits over video calls.
3 — There’s already some great white label video conferencing platforms, mentioned in Part I, but all efforts around standards like WebRTC, codecs, and all connectivity efforts are part of this circle. Good example here would be Twilio now a public company, or agora.io which went public last month.
4 — All that great AI powered audio and video signal processing. Getting rid of the noise, the echo, helping you see the other person better, and many more new capabilities. This circle doesn’t really exist today.
A couple of observations:
- Today 1 are trying to do everything. They also recognise the oppertunity in 2, and trying to reach over, but general purpose platforms can only go so far.
- To be good at 1 and 2 (high in the stack), you need good end customer understanding and good UX. To be good at 3 and 4 (lower in the stack) you need deep technology understanding — Different core competencies
- 1 and 3 exist today, 2 is emerging, 4 doesn’t exist.
- Big money will be made in 2, but only if 4 also emerges. Otherwise 2 will just disappoint
- The growth of 2 will fuel 4. Then 1 will benefit from 4 as well, become better
If I were to describe a world where video conferencing isn’t rubbish, there would be major independent companies occupying each of these circles. Crucially including circle 4 which is mostly absent today. The success of this whole market relies on strong technology companies providing real-time signal processing capabilities.
As an end user you wouldn’t be directly buying these capabilities, in fact you would likely never hear the name of these companies at all. The conferencing platform providers you use and know, would buy these capabilities from these technology companies. This would free the platform companies to use those sales organisations to listen to the end customers, figure out what they need at the application level, and tailor products to work the way the customers want.
To fix this market we need an ecosystem of players, not one vertical behemoth owning the whole stack. We’ve seen that doesn’t work.
The Change is Coming
We’re starting to see more companies trying to build the underlaying technologies with a sell to all business model, rather than making their own platform.
As an example, on noise suppression a great example is a startup called Krisp, building an AI power noise suppressor. Other recent examples are a tool in beta called mmhmm, which is a mix of UX features and better video and image processing to make the experience of video calls more productive.
Today these companies are offering their capabilities as plugins that the end user can use with any conferencing platform. I think it feels inevitable that for these startups to succeed they will have to evolve out of offering plugins to end users and become technology suppliers to the platform providers, but that in itself is a journey they will have to go through.
This is more than just Video Conferencing
You might be thinking some of this real time audio signal processing capabilities must have existed for years? My phone does it for example.
If you go back 5 years, the majority of the audio signal processing was happening on the communication devices themselves, like your phone, or your physical conference phone sitting in the conference room. Traditional integrated circuit and signal processing companies like Analog Devices, Texas Instruments, STMicroelectronics, have been providing real time signal processing capabilities for decades.
But two things have happened recently, that changes things up:
- While the initial signal processing in the devices remain, we’re seeing an emergence of a more advance signal processing away from the devices to the cloud. There is more information, or information from multiple devices in the cloud, enabling much better processing opportunities for certain problems.
- Advances in AI for both audio and video processing, combined with the vast increase in centralised processing capability has increased the possibilities. This in turn has also pushed the signal processing further up from the embedded devices in many cases and away from the offerings of the traditional incumbents in the real-time signal processing space.
I believe and I’m hopeful that we’ll witness one of 2 things over the next decade in this space of real-time signal processing: Either the traditional power houses of signal processing, predominately the IC manufactures, will grow their AI driven capability and transition further up the signal processing chain, and offer more application specific audio and video processing capabilities, or we’ll see a new breed of major company emerge, filling that gap in the ecosystem, providing application specific real-time signal processing capabilities. There’s a real need for this.
So What Am I Doing About This All?
I have a new startup. We’re early and still in stealth, so we’re not talking much about what we’re doing at the moment, but we have a couple of cool bits of technology and our goal is to build state-of-the-art, fully qualified audio and video signal processing capabilities, with a focus in the voice and image communication.
We want to make digital communication between humans better, a lot better.
We’re experimenting with a sort of Digital Signal Processing as a Service (DSPaaS) business model, where you pay for the real time processing of audio and video feeds in a by minute model, this way regardless of your size and application you can get hold of the best signal processing needed for your application, and you also don’t need to worry about the infrastructure or latency, it’s all taken care of. We want to help transform this market, and if we do our job right, you’ll never hear our name, but your digital communication platform, whichever one you use, and whatever you use it for, should get a whole lot better.
I mentioned some startups and bigger companies in the text above. There’s a lot of companies entering this space, but here’s a link to the ones I mentioned:
Swedish company in the remote healthcare space. Their UK effort is under the name Livi.
Twilio is pretty large and now a public company that offer communication capability to integrate into you product.
A audio and video connection platform to build your own product on top of, which went public on NASDAQ last month
Krisp, is that noise suppression startup I was talking about above. They have a free plugin you can try
mmhmm is that beta tool that help with screen sharing and processes the video on calls, on their website at the moment at least you can find a video explaining what they’re doing. I’ve added the link for the video as well.
Pommelhorse — This is our new startup company working in this space, there’s almost no information on the website at the moment, sorry.