Data Networks and Liquidity: How to Avoid Dead Ends
The rate of data decay, geographic constraints, and the method of data acquisition combine to define liquidity barriers for data networks — traits that strain liquidity should ideally be balanced by those that ease it
Data network effects are a tricky beast and come with a difficult set of trade-offs. But these trade-offs only become meaningful after the data network has gained critical mass. The considerations for gaining critical mass on a data network are largely unique from other models because users don’t interact with each other — they just interact with a product that is augmented by crowdsourced data. This results in even more trade-offs that add to the complications of building a data network.
For data networks, critical mass (or liquidity) is best defined as the minimum quantity and quality of crowdsourced data required to create a valuable product. This is influenced by three factors — the rate of data decay, how “local” the data is, and the method of data acquisition. The first two of these factors were also the primary determinants of defensibility and scalability. Let’s first revisit them to understand how they shape liquidity:
- Rate of data decay: This is a measure of how “real-time” the use case of the data is. Real-time data “decays” or expires almost as quickly as it is collected. This means that real-time data networks like Waze need a higher base of users at a point in time to reach critical mass. In contrast, “static” data networks like TripAdvisor collect data over time which reduces the required threshold.
- Local vs. global data: This is a measure of geographic constraints on acquired data, i.e. how relevant it is to users across regions. Hyperlocal data networks like Waze crowdsource largely local data to improve their product. As we saw in the case of marketplaces, they need a sufficient density of users in every active region to accomplish this. For cross-border networks like Mapbox, acquired data is valuable for users in all regions — this makes it easier to reach critical mass.
Based on this, data networks that combine static use cases with cross-border data (Type A) are likely to be “liquidity inclined”. On the other hand, data networks that combine real-time use cases with hyperlocal data (Type B) are likely to be “liquidity resistant”. As I have previously explained, these are the only possible combinations of these two factors. However, there is a third factor that can change the considerations for achieving liquidity on data networks — the method of data acquisition.
Method of Data Acquisition
There are two primary approaches to crowdsource data — passively, in an automated way, or via active user input. Let’s take a deeper look at how these two approaches affect liquidity barriers on data networks.
Some data networks acquire data automatically once users opt-in, irrespective of their activity on the product, i.e. they acquire data passively. For example, XANT (previously InsideSales.com) requires customers to grant access to their contact databases and emails when they sign up. XANT then automatically collects data from sales reps’ communications as they go about their daily activities. Finally, XANT uses this data to train its recommendation algorithm which helps improve sales productivity. Here, data acquisition does not depend on direct user engagement, so gaining critical mass is purely a function of adoption. Robotic process automation (RPA) companies like UIPath and Automation Anywhere fall into this bucket as well.
Startups built on passive crowdsourcing still face the “cold start problem” because the product has little to no value on day zero (without any crowdsourced data). Free trials are an easy way to overcome this — customer sign-ups (almost) immediately improve product value because engagement is not a requirement for data acquisition. This makes it easy to reach critical mass, not just for the startup in question, but also for competitors.
A subset of passively crowdsourced data networks may have optional data-sharing programs. For example, ZoomInfo only acquires data from users who sign up for its community edition, not from all users. In this case, data acquisition is still automatic and does not require any further intervention from users who have opted in. The only difference is that critical mass (and the data network effect) depends on the adoption of its opt-in program, rather than the product. This somewhat raises the barrier to liquidity, but not meaningfully, because ongoing engagement is still immaterial to data acquisition. The “cold start problem” can still be solved fairly easily — with a (limited) free or discounted subscription. For example, ZoomInfo gives community edition users 10 free contact views per month in exchange for sharing their contact directories. On the whole, passive crowdsourcing has low barriers to liquidity — this makes it relatively easy to kickstart data network effects but at the cost of lower entry barriers for competitors.
Other data networks only acquire data when users take a specific action on the product. For example, Tripadvisor only acquires data when users add new reviews. Its repository of reviews then attracts new users, some of whom may also leave additional reviews. For products like Tripadvisor, user adoption is not sufficient to gain critical mass. Instead, it requires active contributions from a large portion of their user base.
This makes the “cold start problem” more challenging to solve. Often, startups built on active data acquisition will need to “do things that don’t scale” to get the product off the ground and attract an initial user base — for example, getting friends and family to contribute or scraping content. This also makes it difficult for competitors to gain critical mass — they need adoption and a high “data generator-to-consumer” ratio to create a “good enough” alternative. Beyond Tripadvisor reviews, Waze’s crowdsourced incident reports are also a good example of active data acquisition.
Keep in mind that active and passive crowdsourcing can be combined — some datasets can be acquired from active user submissions while others are collected passively. For example, Waze passively collects anonymized location data to inform traffic estimates, but actual incident reports rely on active user input. Similarly, Moovit has a community of users who report changes in the public transport network — disruptions on specific stations, lines, etc. However, its real-time transit updates are based on anonymized location data that is automatically collected from users. In this case, it is more meaningful to assess the barriers to critical mass individually, for each dataset. However, the dataset that is more central to the core value proposition underpins the data network effect — based on this, it is more accurate to categorize Waze under active crowdsourcing and Moovit under passive crowdsourcing.
The Data Matrix: Liquidity Trade-Offs
The method of data acquisition can combine with the rate of data decay and degree of localization to ease or exacerbate liquidity barriers faced by Type A (liquidity inclined) and Type B (liquidity resistant) data networks. As a result, the method of data acquisition is best visualized as a third axis (Z-axis) on the data matrix. This introduces even more trade-offs to the already complicated dynamics of data network effects.
As we can see, real-time, local data networks (Type B) are much less likely to rely on active data acquisition because they are already liquidity challenged. This is reminiscent of the “spontaneous togetherness” problem and the resulting liquidity challenges that plague some synchronous networks. It is simply more difficult to build a critical mass of users if they need to be spontaneously active at the same time.
Waze is one of the only examples of a data network that managed to successfully combine active crowdsourcing with a real-time model. Waze was able to overcome liquidity challenges by using the same strategy used by successful synchronous networks — it targeted users who were already likely to be active at specific times, i.e. commuters. Also, the identity of the data source was immaterial — it didn’t matter if a Waze incident report was submitted by Elon Musk or your grandmother. This lowered liquidity barriers even further — Waze just needed a critical mass of anonymous users to be active at the time, not a specific set of users.
Moovit is a more representative example of a local, real-time data network (Type B) because it primarily relies on passive crowdsourcing. This made it easier for Moovit to gain critical mass in its early days. But low liquidity barriers also spawned competitors like Transit and Citymapper in other regions that Moovit had not yet targeted. On the other hand, cross-border, static data networks (Type A) like TripAdvisor have lower structural barriers to liquidity. This gives them more leeway to use active crowdsourcing — a data acquisition method that strains liquidity.
To summarize, achieving liquidity on data networks depends on a combination of the rate of data decay, geographic constraints, and the method of data acquisition. To maximize your odds, you need to balance characteristics that strain liquidity with those that ease it. In particular, local and real-time data networks (Type B) with active crowdsourcing are the most liquidity challenged variant. Entrepreneurs grappling with this model should follow the recommendations I outlined for synchronous interaction networks — either pivot away from a real-time use case or target users who are already active at specific times.