Data Collection in Google Maps and the Crowdsourcing Revolution

Marcus Chu
CISS AL Big Data
Published in
12 min readDec 3, 2023
Fig. 1: When we think of “GPS” or “car navigation,” the image of a Google Maps or Apple Maps UI comes to mind for most. Via CNN.

“Data is the new oil” -Clive Humby

Maps as Big Data

As human civilization expanded and explored the world, the quest for knowledge led to the accumulation of vast amounts of data and the creation of maps. In the early days, cartographers relied on ancient texts to depict the world. Explorers like Columbus personally witnessed and recorded the landmasses and locations they encountered. However, in the 21st century, the process of collecting and mapping data has accelerated exponentially.

What once took several centuries of expeditions to gather can now be achieved within a year. The advent of technologies like Google Maps has revolutionized our understanding of the world, as seen in Fig. 1. Maps are no longer obscure diagrams; they provide us with detailed information and allow us to explore distant places and even navigate their streets from the comfort of our homes. This capability, which would have been akin to teleportation just a few centuries ago, is now realized through the vast amounts of data collected by Google and other similar platforms as seen in Fig. 2.

Fig. 2: Frederik De Wit’s 1654 Dutch Sea Atlas appears primitive compared to an average screenshot of Google Earth nowadays. Image courtesy of the Harvard Map Collection. Google Earth screenshot via Google.

To create comprehensive and high-definition maps that cover the entire planet spanning 196 million miles, an immense volume of data must be continuously collected, processed, and analyzed. This very concept is known as Big Data. Big Data as already suggested, deals with extremely large datasets, so large traditional analysis means such as statistics are incapable of handling. These large datasets are analyzed to find hidden correlations visible only in the big picture, at the large scale, revolutionizing today’s landscape by being able to seemingly be the all-knowing, all-seeing.

However, everything Big Data sees and tells is built on the data, making data collection an extremely important part of Big Data analytics if not the most important part. In this article, we’ll dive into the data collection behind Google Maps, the second “revolution” of Big Data; crowdsourced data collection, the challenges and complexities surrounding this method of data collection, and what it means for the continued development of Google Maps.

Current Means of Data Collection in Google Maps

Google has evolved beyond a simple map service by not limiting itself to traditional satellite imagery to build macro maps of the world but also delves into the micro using imaging techniques such as Street View cars and trekkers.

Fig. 3: NASA’s Landsat satellite. The Landsat constellation contributes significant amounts of data to Google Earth. Photo via NASA.

In traditional satellite imagery, Google draws imagery from various satellite constellations whose orbit paths allow the satellites to crisscross and image various parts of the globe. For more detailed maps, satellite imagery is combined with aerial imagery collected by aircraft or drones flying at lower altitudes over regions such as cities, as seen in Fig. 3. This raw data takes various forms, from color images similar to what we would expect when using Google Maps, to panchromatic images, which are images captured using special imaging sensors that image in various spectrums from visible light to near-infrared to capture as much resolution as possible. Then, to create a final map that users would interact with, Google utilizes complex image analysis and stitching techniques to determine when and where the photos were taken and blend a series of photos to create a contiguous human-usable map.

Fig. 4: A panchromatic image (Image c) next to a visible light image (Image a) and region classification image (Image d). Image via Researchgate.

But the data doesn’t stop there. Google continually seeks more detailed data, not settling for the macro scale but striving for the micro-scale. Take Street View for example, a previously unheard-of capability in a map on a global scale, allowing people to truly teleport and see places on the map just as if they were there. The data collection for Street View takes various forms as well. It is centered around capturing a specific location using multiple high-res cameras to create a 360 image of the location. These 360 images are captured not only by Street View cars but recently with human-portable Trekkers, significantly increasing the areas where Street View is now available, allowing Street View to not only be used along roads and other frequently traveled areas but even places such as Antarctica and El Capitan. The imaging systems on Street View cars and Trekkers are equipped with both traditional cameras and other sensors such as Lidar sensors to capture images and metadata associated with images such as distances to objects and orientation, which will be used to process the collected data. Much like satellite and aerial imagery, these images are stitched together to create a full 360 panorama with accurate-to-life dimensions leveraging the data from the auxiliary sensors, as seen in Fig. 4.

Fig. 5: A Google Maps Street View car. Image via PCMag.

Drawbacks of Current Means of Data Collection

However, despite the innovation and progress that has been gained through continually iterating and improving the insights collected through this data, these methods of data collection are not without their drawbacks.

One such drawback is the inherent lack of agility in these data collection methods. For example, because satellites collecting imagery are unable to change their orbits, they are susceptible to changing atmospheric conditions, for example, clouds, which would affect the speed of data collection since images would need to be cloud-free when collected. Additionally, data collection through satellites and means such as aerial imagery is inherently time-consuming and expensive, limiting the speed of a “data refresh” to just once a year for frequently changing locations.

Additionally, while Street View cars and Trekkershave greatly improved the ease of data collection by allowing regular drivers and even private groups or organizations to collect and contribute data, these methods of data collection still have limited reach, as seen in Fig. 5. One example of this is updating speed limits using data collected from Street View cars. While this is far more efficient than sending human data collection personnel to identify speed limit signs and input speed limit data, the speed at which the data is updated is still relatively slow. There may only be one Street View car passing through an area every 6 months, and in those 6 months, a multitude of reasons can cause speed limit changes, such as construction. These drawbacks therefore necessitate a second revolution of data collection, which will be expanded on subsequently.

Fig. 6: Groups such as tourism boards can contribute to Google Maps Street View through Street View Trekkers. However, Trekkers are still out of reach for most average users of Google Maps. Photo via Mashables.

Crowdsourcing: The Data Collection Revolution?

And so the revolution that Google hopes will revolutionize Google Maps again? Crowdsourced data. Crowdsourcing is the practice of obtaining information by enlisting the services of a large number of people, typically via the Internet. In this case, the very people who use Google Maps daily. We’ll look at the example of crowdsourcing location information and examine the benefits and complexities of this type of data collection, and how Google is mitigating these complexities.

As the number of places in the world continues to increase, the complexity and scale of Google’s mapping operations also must increase. However, Google is ultimately incapable of spotting every single change in our world, where a business has relocated to, if a business’s opening hours have changed. It is a “master at all places, but expert at none.” Therefore, Google has called on the “experts” of a particular place, the users of Google Maps to be the data collectors as seen in Fig. 6. This has several advantages over traditional means, including increased accuracy and frequency. As previously said, when Google calls on people in those specific places, they are very likely to have a strong knowledge of the area, instead of an employee whose data collection route might take them on a trip of 100km through unfamiliar areas. Therefore, the accuracy of data collection is greatly improved. The frequency and thus agility of data collection is also improved. As mentioned earlier, traditional data collection means such as satellites and Street View cars, may only capture and update data about a specific location every half year or even every year. Thus, these methods of data collection are relatively low frequency and have limited agility to remain accurate if the place changes, such as the location of a structure on the map. However, by drawing on the users of Google Maps, there would be a significant increase in the number of people collecting data. Couple that with their knowledge of the area, and the fact multiple people are collecting data, and there is now an extremely agile and accurate means of data collection.

Collecting Crowdsourced Data

So how exactly does Google acquire this crowdsourced data? Through the app, of course. Take the example of data about businesses. A visitor to a restaurant might arrive at the restaurant, and find that its business hours have changed, and unfortunately, it won’t be open for the next hour, even though Google Maps incorrectly said it would be open. The user can go into the Maps app, and submit modified opening hours, plus a photo showing the opening hours sign. As other customers also arrive and find the opening hours are incorrect, they would also submit feedback and modify opening hours. As more and more users submit feedback, Google now has sufficient information to make an edit to the business’s opening hours, collected within the span of a few days or even a few hours, compared to a month or longer with traditional data collection means.

What about when users are too busy to spend time to write and submit feedback, such as when driving? Google still can leverage crowdsourcing in this case. An example where this situation may happen is when speed limits temporarily change, such as during construction. Having accurate speed limit data is essential to functions such as estimated time of arrival and traffic jam data. However, users are extremely unlikely to pull over to the side of the road every time there is a temporary speed limit or pull out their phones to report a traffic jam. Instead of necessitating direct action from users, Google leverages GPS data to collect crowdsourced traffic data, as seen in Fig. 7. After the app identifies the user is driving a vehicle, it compares data derived from GPS location data such as speed to what is currently on Google Maps to identify if data could potentially be incorrect. For example, if groups of vehicles slow down along the same section of highway in an apparent pattern, Google would update the speed limit to reflect the average speed of all cars that slowed down in that section of highway. Similarly, if large numbers of users are recorded as stationary or slowly moving for extended periods along a stretch of highway during rush hours, Google updates the routes on Maps to reflect no traffic jam, moderate, or severe.

Fig. 7: An example of real-time traffic jam and road construction data in Google Maps acquired through crowdsourcing. Image via Business Insider

Challenges and Roadblocks to Crowdsourcing

But as more data becomes generated by users, and average citizens going about their day-to-day tasks, the question of authenticity appears. Given the sheer number of people able to contribute to Google Maps, how does Google differentiate genuine crowdsourced data from data submitted with malicious intent? One might say the sheer number of people able to contribute crowdsourced data will eliminate the possibility of bad actors by making it statistically impossible to “lay low” so to speak. However, in the age of ever-changing threats and evolving technology, one can no longer assume that statistics alone will maintain the integrity of data. For example, what if a bad actor used a botnet, computers that have been taken over by malicious code, to spoof bad reviews on a particular restaurant? If the authenticity of reviews was only assessed using statistics, how varied a review is considered to the rest of the data points, these malicious reviews would masquerade as legitimate reviews. Any legitimate reviews would be eclipsed by the sheer volume of fake reviews. As such, Google implements various checks and balances to ensure data collection can remain crowdsourced while also maintaining its integrity.

One such method that Google uses to maintain the integrity of data is utilizing machine learning driven by Big Data to uncover patterns indicative of malicious activity that are visible only at the scale at which Big Data considers data. For example, comparing the attributes of accounts submitting the reviews against attributes of known malicious accounts, such as days since account creation, IP address, and review history. Another example of machine learning being used to prevent malicious content is to detect fake or otherwise altered images. For example, in the aforementioned establishment opening hours example, the photo submitted by users serves as one of the checks and balances to ensure data integrity. However, in today’s day and age, photos can easily be spoofed. As such, Google utilizes machine learning to spot slight variations in images that are signs of altered images. Finally, data integrity is not maintained solely by automated systems. Google still conducts significant efforts to understand malicious actions and preemptively mitigate risks, such as disabling contributions for places where correct data is essential, such as voting stations during voting times to prevent the spread of politically driven misinformation.

Fig. 8: For areas such as voting stations, it is paramount to minimize disinformation to ensure the integrity of elections. Photo via The New York Times.

The Path Forward

So is crowdsourced data the next revolution in Big Data, or is it bound to become a difficult landscape to navigate as society is today? With companies like Google leading the charge and bringing years of experience handling Big Data no one else can, the future holds significant promise. However, the impetus lies on companies at the forefront of the crowdsourcing revolution to bring crowdsourcing to the masses. Not only do issues such as data integrity need to be addressed, as seen in Fig. 8, but issues of “data as the new oil” need to be addressed, whereas companies strive to improve their products based on customer data, they intrude farther and farther into customers' personal lives through their data. Crowdsourced data may even allow us one day to eliminate all reliance on companies such as Google, and allow society to become truly self-sufficient and governing, with everyone and everything contributing to a shared data pool for the betterment of all. Crowdsourcing is the way forward. But how we get there, how we build data collection methods and mechanisms, is of paramount importance.

Citations

1. Android Police. (2021, August 24). Google Maps is trying to be painfully clear about how crowdsourced data powers navigation. Android Police. https://www.androidpolice.com/2021/08/24/google-maps-is-trying-to-be-painfully-clear-about-how-crowdsourced-data-powers-navigation/

2. City Monitor. (n.d.). How Google’s geo-crowdsourcing is transforming the map. City Monitor. https://citymonitor.ai/community/how-googles-geo-crowdsourcing-transforming-map-626

3. 9to5Google. (2021, August 23). Google Maps navigation requires you to crowdsource data. 9to5Google. https://9to5google.com/2021/08/23/google-maps-navigation-data/

4. Lifewire. (n.d.). Why Crowdsourcing in Google Maps Helps Everyone. Lifewire. https://www.lifewire.com/why-crowdsourcing-in-google-maps-helps-everyone-5116233

5. Google. (n.d.). How Google Maps protects against fake content. Google. https://blog.google/products/maps/how-google-maps-protects-against-fake-content/

6. Google. (n.d.). Google Maps 101: How we tackle fake and fraudulent contributed content. Google. https://blog.google/products/maps/google-maps-101-how-we-tackle-fake-and-fraudulent-contributed-content/

7. Google. (n.d.). How Google Maps uses machine learning to fight fake contributions. Google. https://blog.google/products/maps/google-maps-fake-contributions-ai-machine-learning/

8. CNN. (2019, October 21). Google Maps will now help you avoid that speeding ticket. CNN. Retrieved from https://www.cnn.com/2019/10/21/tech/google-maps-reporting-trnd/index.html

9. Google. (2012, January 12). Google Earth 6.2: It’s a Beautiful World. Google Maps Blog. Retrieved from https://maps.googleblog.com/2012/01/google-earth-62-its-beautiful-world.html

10. Atlas Obscura. (n.d.). 7 Gorgeous Sea Maps From The Age Of Exploration. Retrieved from https://www.atlasobscura.com/articles/7-gorgeous-sea-maps-from-the-age-of-exploration

11. NASA. (n.d.). NASA Spinoff EE-1. Retrieved from https://spinoff.nasa.gov/Spinoff2015/ee_1.html

12. ResearchGate. (n.d.). RGB Multispectral image (a), Panchromatic band (b), SAR intensity ©, land-use map (d), and ground truth (e). Retrieved from https://www.researchgate.net/figure/RGB-Multispectral-image-a-Panchromatic-band-b-SAR-intensity-c-land_fig2_325695778

14. PCMag. (n.d.). Google Debuts First All-Electric Street View Car in Dublin. Retrieved from https://www.pcmag.com/news/google-debuts-first-all-electric-street-view-car-in-dublin

15. Business Insider. (2015, November 20). How Google Maps knows about traffic. Retrieved from https://www.businessinsider.com/how-google-maps-knows-about-traffic-2015-11

16. The New York Times. (n.d.). Voters Around the Nation on Election Day. Retrieved from https://www.nytimes.com/video/us/100000003215661/voters-around-the-nation-on-election-day.html

17. Mashable. (n.d.). Google Street Trekker. Retrieved from https://mashable.com/article/google-street-trekker-sg

--

--