Mapping the world

Shantanu Bhattacharyya
Blog | Locus
Published in
4 min readJul 28, 2016

Enter current address. Enter Destination. Done!

Whether you are checking out a new restaurant across the town or driving cross country to meet your loved ones, finding the exact location of any place and the most optimal way to get there has never been easier. One of the most ubiquitously used map service app, ‘Google Maps’, is remarkable not just for finding the best route in a moment between two locations, but also in its uncanny ability to ‘guess’ the correct address even if you mistype the name or address (by a lot)! MapQuest, Nokia maps, Navmii and Ways are some of the popular alternatives to google maps with additional features like completely offline maps or real time road condition updates.

The simplicity of these apps, however, betray the incredible algorithmic and infrastructural challenge that needs to be addressed to return usable results. There are two very different problems that have to be solved for any map related service to work. First, is understanding the addresses input by the user at high precision and second, is finding the optimal route between those input addresses via connecting points if any. Let us discuss them in some detail.

The precision of location problem derives from the fact that we understand locations in terms of spoken language and our brains are capable of tolerating a lot of “fuzziness” in the address information. Consider these 3 addresses:

Apartment 37, Kasturba Housing, New Delhi
Kasturba Housing, Apartment 37, New Delhi
Apt 37, Kastur ba Housing, Delhi

We immediately understand that all of them refer to the same location and use the information appropriately. For a machine though, these are 3 completely different inputs. It does not have a “context” within which it can interpret these locations as the same. Providing that context is the first task of any location based service. There is an entire class of machine learning algorithms called “Natural Language Processing” (NLP) algorithms that attempts to train the machines to understand our language within a given context and convert it to a language they are extremely good at understanding and interpreting — the language of highly precise numbers. For location data, this corresponds to the values of Earth’s latitude and longitude (called coordinates from here on). Coordinates are the necessary and sufficient information any automated location service needs to carry out its task. This process of converting the human readable addresses to the coordinates is called Geocoding.

Geocoding can never be 100% accurate for all addresses because it is impossible to account for all the variations in which someone can provide the same address. One solution is to create a geocoded database of a region and then provide the user with a real time guess for accurate address information as soon as the person starts typing the input address. This is what google maps does when we start typing any location in its search bar. However, there are multiple caveats to this approach. First is the infrastructure cost of setting up a ‘complete’ database and keeping it updated on at least a daily basis. Second is the assumption that the input address is being provided with an internet connection that can update the correct address guesses in real time. So, this solution will completely break for offline address inputs as in the case of postal service address data. The third and probably the biggest hurdle is the regional differences in the format of addresses. United States, for example, has a very well defined system of address specification and the ambiguity of a location is very low. This makes the process of geocoding much more efficient. Contrast that to the scenario in India, where address data almost always has a very high local context dependence and there are no simple rules to comprehend the different parts of the address (also called tokenization).

We at Locus.sh are rather excited than intimidated with this important problem. We are building geocoding solutions that combine different approaches of machine learning, natural language processing, rule-based improvements as well as improvements in hardware for gathering a more complete address information. Our solutions are intended to, not only capture the diversity in address formats but, also the diversity in language and the scripts in which addresses are provided. Many countries in South Asia use their own script and language for addresses. As our algorithms mature, we hope to expand the regional coverage of our solutions.

The second problem of Route Optimisations that I talked about, are essentially variations of the notoriously hard ‘Traveling Salesman’ problem. This problem has an extensive body of publicly available research devoted to it and at Locus.sh we have developed our custom solutions to find the most optimal route between two coordinates while satisfying specific constraints. Of ‘course even the smallest of errors in the initial geocoding will render the whole solution incorrect.

So, all this is to say that soon enough when you are attempting to deliver a package to “Room 17, Sud arshan Building, Ghantaghar ke peeche, Uttar P”, Locus will get you there via the fastest available route.

--

--