Strategically multiply the impact of other data

Reduce the risk of not having a viable data business

Aaron Berdanier
The Startup
10 min readJun 14, 2020

--

Photo by NASA on Unsplash

This article is part two of a series. Check out part one here.

In August of 2012, Microsoft’s Bing Maps completed 100% coverage of high-resolution imagery over the continental US. Meanwhile, two months earlier, Apple Maps announced improved 3D imagery over major US cities. Aerial and satellite imagery were a hot topic in mapping. It was maybe not surprising, then, that just two years later Google acquired a satellite imaging company to support their Maps products.

On the surface, it could have seemed like a simple data strategy: obtain high-resolution imagery and expose it to users. But the potential to use this data for more was already widely identified. In the announcement of their acquisition of Skybox Imaging, Google noted that they would use the data to “keep Google Maps accurate with up-to-date imagery,” hinting at an opportunity to maximize impact and support growth.

How did Google Maps end up leveraging their newly-acquired satellite imagery? And what can it teach us about launching and growing viable data products? Let’s talk about building footprints⁠ — a nice example of a low-profile product with a multiplicative business impact.

Durham, North Carolina, USA. Image by author. Data source: Esri, Maxar.

Proprietary data drive early growth

While Bing was creating a high-resolution composite of the continental US and Apple was creating 3D flyovers, by 2012 Google had already outlined 25 million building footprints on their maps in major US cities. At that time, if I was using Bing Maps to see buildings I would need to view aerial images. But if I was using Google Maps in some places I could see them on the base map. These were different product strategies; Bing compiled extensive high-resolution data while Google created a new data product. The Google Maps team hoped this innovation would help people “orient [themselves], locate landmarks and navigate from place to place.

Even though showing buildings on a map was possible with satellite images, drawing buildings on a map was not as easy in 2012. Google used computer vision techniques to create their building footprints as a derived data product, in a process that relied on human-generated outlines for ground truth data and up-to-date high-resolution input images. By the end of 2014, Google Maps had drawn many buildings around the US (based largely on human work by teams in India) and had secured a proprietary source of raw data to feed and improve their algorithms, helping them spread building footprints around the world. As Jen Fitzpatrick, Senior VP at Google Maps, explained in 2020:

[W]e worked with our data operations team to manually trace common building outlines, then trained our machine learning models to recognize building edges and shapes. Thanks to this technique, we’ve mapped as many buildings in the last year as we did in the previous 10.

Around the same time that Google was investing heavily in acquiring proprietary data and generating new outputs, Apple Maps decided to pivot their mapping strategy away from acquiring other people’s data and towards generating their own proprietary data. In 2016 they also opened an office in India to focus on developing their Maps products, but they were already behind. Four years after Google Maps started making their derived data products, Apple and other mapping products (Bing, Here, TomTom) still did not have building footprints widely available.

Finally, in July 2019, Apple maps added building footprint data (and a lot of other interesting, potentially novel data features). But building footprints were already a commoditized product. Computer vision models were widely published and open-sourced. By June 2018, Microsoft had also released an open-source data set of over 125 million building footprints in the US. As building footprints were appearing in Apple Maps in 2019, Microsoft added an additional 12 million building footprints in Canada to their data set, ostensibly to support Open Street Maps and humanitarian efforts, but also likely because they were no longer unique in the market.

Viability challenge 1: Data products can be duplicated.

Outcomes can be replicated⁠ — anyone could redraw building footprints with enough effort⁠ — and algorithmic methods can be copied or reverse engineered— there are rarely new modeling approaches and even advanced methods are usually available to technical people in academic publications (not protected).

Viability strategy 1: Obtain proprietary data to capture growth early.

What proprietary, ground-truthed inputs gave Google Maps was a runway to attract growth in their maps products with the building footprints before others could offer a similar capability. By owning the source, even if someone else copied the method, they could ensure that no one else would get the same quality or quantity initially.

Durham, North Carolina, USA. Image by author. Data source: Microsoft.

Data based on other data can multiply growth

The impact of the building footprint data product might seem small without considering how Google used that data to increase their advantage even further. Despite being behind Bing Maps in high-resolution imagery in 2012, Google Maps was clearly focused on helping people find their way around cities. Focus can lead to clear user problems.

For example, where should I go when I’m visiting a city? Locating hubs of activity is a simple question that countless travelers have asked friends, hotel concierges, or random people on the sidewalk. In July 2016, while Apple Maps was pivoting to create their first derived data products, Google added “area of interest” map shading “with an algorithmic process that allows us to highlight the areas with the highest concentration of restaurants, bars and shops.”

What is interesting about the “area of interest” product is that it is actually derived from the building footprint product, which was already derived from the satellite data product. Justin O’Beirne details it on his blog with beautiful examples. So, how valuable are areas of interest? It depends who you are. As noted by Laura Bliss in City Lab:

The feature is surely useful for visitors looking to orient themselves to the busiest, most tourist-friendly thoroughfares and neighborhoods. But the introduction of “areas of interest” — which you can’t turn off, by the way — raises an important question: of interest to whom?

They go on to note potential inequities in this product, which I’ll write more about in the future (and for a deeper dive on who decides what gets mapped, check this out). But the speed and scale of the release is impressive, given the complexity of the concept. O’Beirne highlights the work of Rachelle Annechino and Yo-Shang Cheng, who studied mental models of San Francisco and showed how people conceptualize the city with commercial corridors. As O’Beirne explains:

Annechino and Cheng spent months researching one city. But not only did Google [algorithmically] capture all of their commercial corridors (and several more), it somehow came up with them for thousands of [other] cities across the world.

That last point emphasizes how Google leveraged their algorithmic effort from drawing the building footprints to grow and expand their new data product⁠⁠ — areas of interest. They achieved this outcome faster than if they had started from zero and, as a result, developed a “data moat” all before Apple Maps had any building footprints drawn. Additionally, areas of interest created a much clearer monetization opportunity for Google than building footprints: areas of interest could incentivize businesses to engage with Google Maps to drive traffic while building footprints might not.

Google finally sold Skybox Imaging (rebranded to Terra Berra) to Planet in 2017, including an agreement to continue to license the data. This is another example of the growth plateau in data products. For example, by the end of 2015 Planet had open-sourced high-resolution imagery of all of California from their fleet of nanosatellites. As satellite tech became more mature and the data became more broadly available, the impact on basic mapping decreased, which required companies like Google to climb the data product pyramid to offer new value and create new products.

Viability challenge 2: Data products plateau in growth.

Even if a product operates with a virtuous cycle, where an algorithm improves with usage, additional inputs create diminishing returns. Accuracy levels off with increases in training data, competitors dilute the value of the results, and novel outcomes slow down. In some cases, quality even degrades with time.

Viability strategy 2: Build new data products on top of other data products.

Google Maps multiplied the impact of their original raw data source⁠ — the satellite imagery⁠ — first by adding initial value with unique outputs and then (as competitors caught up and the growth of the building footprints plateaued) by leveraging their early lead to jump into “areas of interest.” This transformation created an entirely new derived data product⁠ that might not exist without the footprint data.

Ideas and opportunities seem obvious in hindsight, but I also want to emphasize the importance of vision here. Executing this strategy well requires some idea of where you want to go. So maybe viability strategy 1.5 is to have a clear vision for the future, as a guide for what you want to build now to set your product up for the next leap.

Could Google Maps have skipped the building outlines to create areas of interest? Probably (they could have just drawn buffers around their point location data). But areas of interest based on known building outlines create a highly-realistic map that displays information and guides the user in a way that a satellite image or a buffer blob could not. The result makes it easy for a map viewer to find places in a city with commercial activity, which reinforces Google Maps’ core business.

How to ensure a data product is viable

Product viability depends on driving usage, increasing revenue, and maintaining growth through competitive advantages. As explained by Kaego Rust from KHOR consulting: “This means you can defend your market share from competitors, or you can attract and maintain customers in the new market.” There are two major challenges for doing this with data products: easy duplication and value plateaus.

A graph of the viability of a data product through time might look like this:

Growth trajectory for an hypothetical data product, which includes a plateau. Image by author.

Modifying that trajectory could involve changing the slope (faster growth) or the plateau (greater growth). Data products that increase the slope of the line early on and continue to move beyond the plateau will have an increased chance of being viable through time.

Minimize duplication with proprietary data, not proprietary models.

A common focus for data products is to create a virtuous cycle that adds value with new engagement. A classic example of a data flywheel is Spotify’s Discover Weekly, a product that learns from each users’ tastes to create novel playlists that a person might enjoy. While Spotify definitely created a unique predictive model, the proprietary source data (billions of playlists created by humans) set this product apart, not the algorithm. Since algorithms can be copied, proprietary data⁠ — specifically a head start on offering new value with more data — can help create more growth sooner than alternatives.

Growth is greater than alternatives for a data product based on proprietary data, which creates a data moat. Image by author.

So what do you do if you are just starting out and do not have a pile of proprietary data? If you have the resources, you could go out and acquire some (like Google Maps). Or, you could leverage small or public data sets to start the cycle of generating your own. An example of this is Remix, who started by aggregating public data to inform city planning. Now they are adding new derived products to predict the impact of road changes on parking and transit (although it is unclear how much those new products depend on their own data).

Even with a small set of raw data, you could create a new derived product that could help you collect new proprietary data to jump-start your data flywheel. The important piece is to eventually start collecting proprietary data to create a strong competitive advantage.

Overcome plateaus with data built on other data, not built from scratch.

While unique inputs can improve a data product initially, at some point it may not matter if you have a million observations or a billion observations if the additional data or algorithm enhancements do not support growth. While the timing of that plateau will change depending on a number of factors, it is important to prepare for it. And it is advantageous to not start from scratch. For example, just like Google Maps did above, Uber uses their travel time predictions as an input for their Express Pool product to improve ridesharing and the rider experience. Google Maps has also taken the same growth approach to predict delays in public transit buses based on derived estimates of car traffic.

Growth is sustained by leveraging early data products to create new data products, which maintains a data moat versus alternatives. Image by author.

How do you replicate this strategy with your own data product? Think beyond the immediate value of your data towards other types of data products and your long-term vision. If you give away or sell your proprietary data, you reduce your ability to innovate on it in the future. What future do you want to enable? How can you leverage your existing data products to get to that future faster?

This growth strategy is not unique to data products, but the potential for data products to support the next innovation is powerful. Even with a relatively-low value data product (like building footprints), a business could create the foundation for an entirely new and even more impactful future data product. The outcome is that you can multiply the impact of your original data and maintain growth while moving even further ahead of the competition.

Summary

Data products face viability challenges because their outcomes can be duplicated and their growth slows down with time. In the digital mapping space, these challenges appeared with the push to offer improved imagery and then better maps based on that imagery. Strategies to maximize viability include using or collecting proprietary and protected data sources, focusing on a long-term vision, and leveraging existing data products to make even better new data products.

Reach out on LinkedIn or Twitter to talk more, I’d love to hear from you.

--

--