On WFH, zoomalternative.com, and being late to Clubhouse
If you know Point Nine a little bit you might know that we’re extremely excited about video conferencing technology, and more broadly, about how video as a medium is transforming everything from education and entertainment to communication and marketing. In the last few years we’ve invested in several startups that use video in a variety of different ways: Confrere, a video calling solution for physicians; Loom, a beautiful tool for asynchronous video messages; PlayPlay, which lets anyone easily create professional videos; Preply, a learning platform for language tutoring sessions over video; ScreenCloud, a web-based digital signage software; Zype, which helps companies distribute, manage, and monetize digital videos; and one more recent addition to the #p9family that hasn’t been announced yet.
My personal interest in video conferencing is related to the fact that I’ve been working remotely for more than twenty years. If you spend hours and hours in video conferences every day you can’t help but develop a keen interest in any piece of hardware or software that might improve the audio or video quality or overall meeting experience. :-) (Feel free to reach out if you want to geek out on that topic.)
There’s no one-size-fits-all solution for human interactions
We’ve always had a remote-friendly culture and a geo-agnostic investment strategy at Point Nine. As a result, the last months weren’t that different or difficult for us. Even so, four months of social distancing has made me realize one thing: While Zoom has done a fantastic job in bringing reliable, high-quality video meetings to companies around the world, the types of meetings that Zoom excels at represent only a subset of a much larger number of social interaction paradigms.
Think about some of the many other ways in which people meet, communicate, and interact in the physical world: A consultation with a physician. A university lecture. A sales or customer support call. A family dinner or a team lunch. Playing a board game with friends. Watching a movie together. A networking dinner. A brainstorming session. A conference.
If you wanted to design a software product aimed at replicating the experience of, say, a team lunch in the online world, that product would likely look very different from a tool for online lectures or software for brainstorming sessions. It would undoubtedly look and feel different from what Zoom looks and feels like today.
There is no doubt that Zoom will continue to develop its products to address further use cases. Moreover, by letting third-party developers build products on top of its platform Zoom might become the backbone of our new virtual world.
I’m convinced, however, that there are plenty of opportunities for startups to build large, successful video conferencing businesses that focus on specific niches and offer the best possible user experience for their chosen meeting types or use cases. People are already used to using multiple different communication tools depending on the context (e.g. Slack for work, WhatsApp for friends, Skype for your parents). Users may be loyal to a tool or service, but their loyalty is usually limited to certain contexts. FWIW, I’m so bullish about Zoom alternatives that I recently registered the domain www.zoomalternative.com. Let’s see what it might be useful for. 😁
How would a Zoom for dinners look like?
Continuing this line of thought I asked myself: How would a video meeting product for networking dinners with, say, 20–50 guests, look like? It certainly wouldn’t be a Zoom meeting with 20–50 participants. How would you design an application that enables social interactions similar to those at a business dinner event? With the caveats that a) I’m not an expert at dinner networking by any stretch of imagination and b) we’ll have to forget about the food for now, here are my thoughts.
Let’s imagine a dinner with 40 people, four tables with 10 participants each:
The initial seating order is either provided by the host or emerges as attendees pick seats, in which case it’s a result of the attendees’ relationships, personalities, interests, status, plus an element of randomness.
Let’s think about what’s happening here:
- With around 10 people per table, it’s still possible to have one conversation that everyone at the table participates in, depicted by the large blue oval in the top left. (If there are significantly more than 10 persons seated around one table you will start to lose the ability to have a single conversation — unless there’s a moderator, maybe, but that’s yet another type of meeting.
- In most cases, there will be multiple conversations per table, as illustrated by the various other blue ovals above. To be precise, the number of simultaneous conversations per table is in the 1–5 range (not counting monologues, prayers, phone calls, and assuming there are neither cross-table conversations nor people engaging in multiple conversations at the same time).
- People tend to have conversations in “groups” consisting of 2–5 (or maybe a few more) participants, mostly with people that are spatially close to them (for obvious, acoustic reasons).
- Groups emerge spontaneously and can be expanded or reduced in size, merged with other groups, split into subgroups, and dissolved, without any central planning, based on the behaviors of the various actors.
- People frequently switch between groups in their proximity (for any number of reasons).
- Occasionally, people swap seats in order to join a farther away group, indicated by the violet arrows above (because the grass is always greener on the other side).
- Depending on the room layout, groups may span across neighboring tables, although this may require a certain amount of physical suppleness (green oval).
- People occasionally switch to another table, e.g. for the dessert or afterwards (red arrows).
How does the switching work?
For starters, it’s important to remember why people are able to have a conversation in a noisy restaurant in the first place. How can you understand what another person is saying when there are multiple other people yelling talking at the same time? The explanation is the well-known cocktail party effect, which describes the brain’s ability to focus on one auditory stimulus while filtering out a range of other stimuli. What’s important in the context of video conferencing is that the cocktail party effect requires stereo hearing. At a dinner event, your left ear and your right ear receive somewhat different audio signals, which the AI between your ears uses to localize sound sources. Unless you use ultra high-end telepresence systems that record and play audio Dolby-style, spatial information is completely missing from the audio that you receive in a video conference, which largely disables the cocktail party phenomenon (and is one of the reasons why video meetings and IRL meetings don’t quite feel the same yet).
Coming back to our networking dinner:
- Because of the cocktail party effect, people are able to participate in the conversation of one group while hearing and capturing fragments of other groups’ conversations.
- People may occasionally shift their attention to a neighboring groups’ conversation for a brief moment or they can use short breaks of their group’s conversation to “tune in” to neighboring groups’ conversations.
- People can use what they’ve learned about neighboring groups’ conversations to decide if they want to leave their group and join another one.
Bringing the cocktail effect to video meetings
The above has shown that in order to replicate a networking dinner, the software needs to meet a few important requirements:
- The software needs to offer an equivalent to the “groups” I spoke about above. The closest thing to that in existing video conferencing software would be a “room” or “breakout room”, but I think “table” might be a better term, so I’ll use that here.
- People need to be able to receive cues about what’s going on at other tables. The trick is to find a mechanism to provide these cues in an unobtrusive, non-distracting way. Audio cues are likely to be too distracting, so listening in to another table’s conversation, even at a reduced volume, while following the conversation of one’s own table is probably not the right solution. Visual cues could work very well, however. Using speech recognition and a simple (tag cloud style) algorithm, I think the software would be able to provide useful cues.
People must be able to easily switch back and forth between tables.
A simple mockup might make it easier to see what I have in mind:
- Non-verbal visual cues (emojis!)
- Option to mute/unmute all visual cues or selectively mute/unmute visual cues from specific tables (a big benefit over real dinner meetings).
- Ability to get cues not only from neighboring tables but also from tables that are further away, filtered based on the user’s interests and preferences (this could make it possible to have meetings with 10x more participants than in the real world).
- Being able to “tune in” to another conversation for a few moments.
- Base initial seating order on users’ interests and preferences.
- Optionally, there could be rules for joining tables (for example, someone joining a table could require a “thumbs up” vote of the majority of the people at that table).
The more I think about it, the more I believe that software has the potential to not only replicate but improve physical networking meetings that enable better, more meaningful conversations. What do you think? If someone has looked into this topic more thoroughly or if there are tools that I’m not aware of, please let me know!
PS: Here’s a comment from Nathan Benaich, who was kind enough to review a draft of this post, fix my broken English, and provide lots of great feedback:
Ouch. I’ve obviously heard about Clubhouse but I haven’t tried it yet, so I don’t know how much of what I described above already exists. Looks like I have to spend more time on VC Twitter. ;-)