Views on Digital Maps and Open Data
When I introduce myself as a mapping developer, one common response I get is “you must have liked maps since you were a kid.” As much as I would like to have done so, the only maps I did look at were the ones in textbooks. My maps-related memories are limited to watching my parents from the backseat while they held folded route maps in their hands, rotating and flipping it around in order to navigate the highways.
The first map I held in my own hands was a smartphone. My friend and I were going to have dinner in Hongdae. Excited about this new technology, I convinced my doubting friend that I was able to find this restaurant of which both of us only knew the name. Being familiar with Hongdae, I was sure I could locate the place in no time by looking at its position on the map. However, the streets in the map looked so different from what I remembered. My usual way of navigating relied on familiar landmarks: walk up the exit no. 6 then go straight, turn left at the bakery, etc. But the mobile map didn’t care about the landmarks in my head, and just showed me the shortest path. Deprived of mental points of reference, I could not figure out where I was on the map, nor where I was supposed to turn and in what direction. Needless to say, our dinner was very delayed and I had to endure my friend’s reproachful speech.
Between Maps and Data
Sometimes I think about whether and how the younger generation, who have grown up looking at maps on their smartphones, have a different perception of space. Maps on digital devices are now so prevalent that my anecdote of not being able to navigate using a mobile device seems almost archaic. We use apps to look up how to get to a place, to share our location with friends, and to search for new apartments.
While the purpose and form of maps on digital devices may vary, the essential element is data. As the use of maps gets diverse, the amount of mapping data grows. Of course, all of the data is not delivered to the user at once; the user only needs and is able to process so much information. The service provider predicts what data is required by the users based on the position and level they are looking at — that is unless the map is centered around a very specific theme or region. The transferred data is then displayed using symbols that represent the properties and importances of data.
Deciding what to show on a map and how to show it is a complicated problem. It is especially so when interests are at odds. For example, let’s imagine a chicken joint and a noodle restaurant, close to each other in the same building. Unless you are working on a floor plan of that specific building, it can be difficult to display the two adjacent businesses in an equal manner. The manager of the chicken place will want their restaurant to be more notable; the noodle place manager will want the same, only for their own restaurant.
In reality, this decision is the prerogative of the map service provider. The service provider rarely reveals much information to the user. Users of the map are unaware of what data the service provider possesses, what criteria they use to select the data to display to users, and so on. The best users can do is make a guess about the process, based on results on the map.
The space we live in is filled with disputes and void of clear answers, yet a map will show you the result of a particular decision-making as if it were the right answer.
Conflicts of interests happen in different layers and scales. In 2016, a debate arose about Palestine disappearing from Google Maps. Some users who were not able to search and find Palestine on Google Maps presumed that Google deliberately removed Palestine from its service, and started protesting the company on social media using the #PalestineIsHere hashtag. Google’s response was that they did not remove it, because “[t]here has never been a ‘Palestine’ label on Google Maps, however [they] discovered a bug that removed the labels for ‘West Bank’ and ‘Gaza Strip’” (The Guardian, Aug 10 2016). Google Maps still does not display Palestine using its state name. Users cannot know whether Google’s map database includes Palestine as a state at all, or if a deliberate decision was made not to show it on the map despite Palestine being listed in the database. This is just another example that shows that maps are results of human interpretation. East Sea vs. Sea of Japan is a frequently reemerging issue in South Korea; restauranteurs hire bloggers for reviews, in the hope for a higher ranking in search results. The space we live in is filled with disputes and void of clear answers, yet a map will show you the result of a particular decision-making as if it were the right answer.
Everyday, we make multiple decisions looking at maps. We find out how to get to a place we have never been before, and what restaurants nearby are good. While we make use of the map, the map affects us. By showing us places we didn’t know about and recommending specific routes, it changes our consumption pattern and perception of the space. When you think about it, relying on a map interpreted from the view of just one company is quite questionable.
To a for-profit entity, map data is an important and sometimes undisclosable asset. Companies deploy cars gathering street view data region after region, as well as a great size of human workforce. This data, collected as a result of lots of investments, becomes a seed capital that can be traded for large amounts of money. This means that individuals working with maps — urban designers, journalists working with data visualization, small entrepreneurs that use maps in their services, etc — often can and do encounter big obstacles when working with the aforementioned map data.
As maps became more and more important, open source projects that collect and distribute geospatial data appeared; these projects define map data as public goods which people have the right to access with no or little cost. Some examples include Open Addresses, which collects address data and processes it for easier use, Who’s On First, which collects points of interest (such as restaurants, libraries, and many more), and OpenStreetMap (OSM), which is an integrated map data collection. For an insight on how such projects were able to develop, let’s take a closer look on OSM, which has been around the longest and served as a hub for the most diverse types of data.
OSM and Its Community
OSM was founded by Steve Coast in 2004, who was inspired by the success of Wikipedia. Users can not only add and edit map data to OSM but also look at the revision history. Data from OSM can be downloaded by anyone, even for commercial purposes. Diverse companies, ranging from small independent printing studios to service providers of full-stack solutions including search and navigation, use OSM’s data. These companies are consumers of OSM’s data but they also contribute to its quality through data processing or the development of additional functionalities. Some of the companies open source the services they developed, so that the technology can be maintained and improved even after they move on.
If you can’t find your grocery store or favorite hiking trail in OSM, you can always add the data by yourself using GPS records or by tracing over satellite images. In case of regions where the government has released public data of buildings and roads, local users can also contribute by importing the data to OSM.
The collective intelligence that sustains OSM has also its drawbacks. Longtime efforts by many people can be set back by a single person’s mistake or malevolence. Suppose you, having the day off and needing some map data, took the time to map a whole neighborhood alone. There is no guarantee that the data you have mapped will be intact the day after. I once mistakenly deleted the country name for Japan. Only after saving my edit did I realize that Japan had disappeared; with a chilling sensation in my back, I restored the previous data.
In order to minimize the damage caused by individual mistakes or active misconduct, OSM emphasizes the role of the community. Sometimes the communication is online, like the email I received from a user who let me know of my mistake. In other cases, offline meet-ups (often called mapathons or maptimes) are held among local inhabitants familiar with the area who learn how to edit data and make contributions. Such meet-ups allow beginners to meet users with more experience, share tips about editing OSM and observe the collective mapping process.
My first mapping meet-up was a small one that my colleagues hosted after the Nepal earthquake of 2015. One colleague with some experience demonstrated how to use the default OSM editor, iD, as well as OSM Tasking Manager in which field rescue workers had listed the areas in need of mapping; then we all dived in. After each edit, I wasn’t confident of whether I was mapping things the right way, and asked many questions to the facilitating colleague as if asking for permission. My colleague told me that first-timers, especially female users, tended to be burdened by perfection, and that my contribution of data will definitely be helpful. This experience greatly helped me communicate, based on trust, with other members of the OSM community.
Even if one does not attend offline meet-ups, one can share issues and information about OSM via wiki pages, mailing lists, and slack channels. The community makes active use of these online tools in order to discuss or warn about seemingly inadequate changes in data, and also to lead the development of tools and documentation that can assist users when editing data.
Open Data in South Korea
Since I am working at an American company that makes use of data from open mapping projects such as OSM, I often felt bad about the lack of data relating to South Korea on OSM. Up until 2016, South Korean OSM seemed to be trapped in a negative feedback loop; there were only a handful of editors, who were able to edit only a limited amount of data, which in turn was the cause of the lack of users drawn to OSM.
In 2017, however, the loop suddenly broke. Niantic Inc., the software company known for its location-based games, launched Pokémon Go in South Korea using OSM data. South Korean Pokémon Go players soon realized that Pokestops tend not to appear when there is not enough OSM data in their area, which led to a notable increase of South Korean OSM users. This was a big surprise for me, who was doubting whether OSM could ever be adopted in South Korea.
The drastic influx did not come without trouble. Some edits didn’t add any geographically meaningful features, while others were outright vandalism. Still other edits did include objects and their forms but came with no other relevant data. An interesting problem arose when a whole series of edits that traced an area had to be discarded. Some Korean users were apparently frustrated by the low resolution of satellite images provided by OSM, so went on to grab images from the Korean Ministry of Land, Infrastructure and Transport (MoLIT).
The problem was that because OSM allows anyone to use its data freely and at no charge, any data source must do the same in order for it to be allowed to be imported into OSM. For example, tracings from commercial satellite images or data with specific licensing conditions are not importable. The Korean data source in question was Vworld, an open platform provided by the MoLIT so that people can freely use spatial data. If it wants people to freely use its data, then why couldn’t it be used to trace terrains in OSM?
The answer: Korean laws. The Act on the Establishment, Management, etc. of Spatial Data, Article 16, Clause 1 prohibits state-produced data from crossing the physical national border: “No person shall take abroad maps, etc. or photos produced for the purpose of a survey, among the results of a fundamental survey, without permission of the Minister of Land, Infrastructure and Transport: Provided, That the same shall not apply to cases prescribed by Presidential Decree, such as where the results of a fundamental survey are exchanged with foreign governments, etc.”
As an OSM contributor, there are moments when I think hard about this law. One such moment was when I discovered Korean buildings data freely available in the Spatial Information Portal. The data included not only the outer forms of buildings but also many other properties such as height, address, and building use classification. Picturing an OSM filled with urban buildings, I thought I might be able to improve the quality of data related to Korea and even contribute to the fostering of a Korean OSM community — which is currently non-existent — by importing this data. I inquired about licensing using the Portal forum and was asked to give them a phone call. In retrospect, this reply got me more excited about the prospect of being able to use the data; I thought, “if I can’t use it they would have just said so. This might be promising!” After weeks of trying to find a time to call during their office hours (the time difference being 13 hours) I was finally connected to the public worker in charge, only to be given the answer: “you cannot use it if you are outside of Korea.” OSM servers are in London.
I did get a clear answer, but “outside of Korea” left me with a lot of new questions. For example, what if I went to Korea, downloaded the data onto my computer, then left the country? What if I, sitting in New York, use a Korean map service to look up my friend’s house? In this last case, a small amount of Korean map data is effectively transferred to my computer. Is it okay because it is such a small amount? Then would it be also okay to extract just a single neighborhood and use it overseas? Or is it okay because the data has been post-processed by the map service — then how much processing makes it okay to bring the data over the border? What if I changed the data format, or added meaningless words to all building names? What if I memorized the height of buildings, then entered it into OSM based on my memory. The Korean law restricts digital data within a physical boundary, and this raises so many questions.
Open data provided by public institutions is an effort to define data as an essential infrastructure and to allow many people to access the data at low cost. If Seoul’s transit data was not public, there could have been no story of an individual who developed the most successful transit app in the country.
Much like we need plumbing and sewers in order to utilize water, we need pipes for open data — ones that allow technical skills to be less of an obstacle in its usage. In the case of OSM, the many open source projects and businesses based on OSM serve as these pipes. Data conversion services and server maintenance services that are common in development environments also fall into tools that make it easier to use data. But if one was to follow the Korean Act on Spatial Data, one would need to verify where the servers are located for each of such services. This is a lot of wasted energy on non-essential parts.
Some dataset entries in the Spatial Information Portal are provided along with the name and number of the public worker in charge. As I read the names of people I have never met, I imagine these many people who must be struggling with all kinds of problems in the newborn open data platform. I know they encounter problems because I have witnessed, while working in my company, the amount of collective effort put into maintaining open data that is not immediately profitable. I just wish that the Korean map data, which must have involved so much investment and effort, will be used to make maps that help analyze and understand the complex world — without being limited by national borders.
Hanbyul Jo is a New York-based software engineer. She works at the open source mapping company Mapzen, where she develops tools to make web mapping more accessible. Hanbyul’s personal work reveals narratives in datasets by combining them on maps, such as in her recent projects Mapping the Candles and Seoul Building Explorer.
Translated from Korean by Achim Koh
한국어로 된 글은 이곳에 있습니다.