What is GIS and why does any digital product company need one?
Mercado Pago, 2024. Every day, millions of users interact with our suite of personal banking solutions and payment tools. Every click, every transaction, every choice they make is a unique combination of variables, pieces of a personal puzzle of needs that Mercado Pago, the fintech business of Mercado Libre, addresses.
Imagine the variety of stories intertwined in this network. From the individual who gets up early to load their transport card in Buenos Aires or Mexico City before starting another workday, to the twenty-year-old venturing into the world of finance by purchasing cryptocurrency for the first time, to the shoe store owner in São Paulo conducting sales through one of our Point payment terminals. Even the group of high school students grabbing lunch at a fast food store after a morning of classes, paying with their account for minors.
Each and every user, with their own contexts and needs, contributes daily to an intricate web of micro-decisions that leave a digital footprint behind. This network, documenting customer actions within our product ecosystem, holds valuable information. It guides us in the next steps to better understand our users, anticipate their needs, and offer an even more compelling value proposition.
Simultaneously, this digital footprint prompts us to explore the universe of potential customers who are not yet part of our ecosystem and to actively seek them out with the final purpose of democratizing access to financial services in LATAM for more people everyday.
This digital footprint has multiple dimensions to decode. One key dimension is transactional behavior: understanding how users interact with our suite, when, how often, and what value this brings to the company. Equally important is examining the actions users do not take or the services they do not use, as these also provide crucial insights.
However, there is an additional dimension that enables us to contextualize the user and their decisions, whether it be an individual seeking banking solutions or a merchant client utilizing payment tools: the geographical dimension, or GEO.
The GEO footprint left by both individual users (who have given explicit consent to share this data) and merchants selling through Mercado Pago is increasingly valuable. This data can be enriched to determine new market penetration strategies and enhance transactionality for our clients.
While our products are digital, they are influenced by the tangible decisions people make throughout their day, decisions that are inherently tied to their physical environment. Understanding this environment and locating people and businesses within it sheds light on issues such as the presence of major commercial hubs, isolated areas, regional socioeconomic traits, and their impacts. For instance, a user or business with identical attributes to another may display radically different behaviors depending on the environment they are in. Without considering location, any assumptions or fintech value propositions we make are incomplete and could even lead us to strategic errors.
GIS
What exactly does the GEO dimension consist of, how are geographical data collected, and most importantly, how are they valued? How do we transform georeferenced data into strategic insights?Answering these questions takes us to the core of this article. Applying the geographical dimension to the daily operations of a company as large and complex as Mercado Pago goes far beyond merely collecting user or transaction geolocation data, which is always carried out with the user’s consent. It requires the construction of a Geographic Information System (GIS) that truly allows the teams developing products and those directing the go-to-market strategy to operate with complete clarity on where to focus efforts, thereby maximizing the dual objectives of acquiring new customers and enhancing the experience of existing ones.
So, what is a “GIS”? The acronym GIS stands for “Geographical Information System,” a system in which geographical data can be collected, modeled, contextualized, correlated, made available and finally visualized in such a way that it allows for insights beyond the reach of human perception.
As a discipline, GIS is relatively new. While its roots can be traced back to the beginnings of cartography — which itself is an attempt by humans to represent the physical world to guide decisions — GIS did not achieve its status as an information system until the 1960s. It was then, with the advent of computers and the first concepts of quantitative and computational geography, that GIS began to emerge as a branch of data science.
Visionary pioneers in the GIS world, such as Jack Dangermond, did not merely scale a hardware or software product. Instead, they championed a vision that continues to drive innovation and development in data science today. This vision is the concept of an interoperable database that represents the world and can be queried and operated to reveal patterns, tell stories, and relate events in the physical world from a data perspective.
In Dangermond’s words:
“GIS is about uncovering meaning and insights from within data. It is rapidly evolving and providing a whole new framework and process for understanding.” — Jack Dangermond, CEO, Esri
Source: https://www.esri.com/es-es/what-is-gis/history-of-gis
The past 60 years since the first generation of GIS have seen an unparalleled acceleration in technological advancement. In particular, computer science and data science have experienced rapid development, innovation, and disruption. Numerous factors have converged to enhance not only the technological capacity to quantify, analyze, and represent geographical data but also to democratize the knowledge and accessibility of this data for non-specialized audiences.
The graph below synchronously illustrates significant transformations that have occurred in computational technology, data collection, and data analysis techniques, showcasing the evolution of GIS over time. As can be seen, the true revolution that marked the emergence of GIS took place between the 1960s and 1990s. Since then, technology and data have been the main drivers of innovation, including the revolution brought about by Google Maps and Google Earth, which combined technological innovation with the widespread availability of GIS services.
State of the Art in GIS
As mentioned earlier, the popularization of GPS technology and the birth of Google Maps in 2005 have decisively influenced a wide range of behaviors and social changes, thereby fostering significant development and innovation in GIS. A prime example is the meteoric rise of giants like Uber, which, starting modestly in 2008, achieved a valuation of 60 million USD within just three years, attracting high-profile investors like Jeff Bezos. This success was probably driven by an irresistible combination of intensive technological utilization, a focus on solving global challenges such as people transportation, and a market penetration strategy capable of influencing behaviors and spawning new businesses. These businesses expanded the use of GIS to include proximity services related to transportation, such as food delivery and logistics.
From the perspective of Mercado Libre and any company that manages its value proposition on a territorial level, two key challenges must be addressed to leverage these benefits and develop a robust strategy:
- The construction of solid GIS foundations, which crystallize the ability to alternatively translate and convert colloquial statements to points in the physical world and vice versa. This involves transforming physical points or shapes into manipulable data that can undergo computational operations and calculations.
- The design of powerful tools for the visual exploration and exploitation of geographical data, allowing the human eye to answer questions and make efficient decisions in territorial terms.
Companies dedicated to providing GIS as a service, such as Carto and Google Maps itself, along with those companies that have driven technological innovation as a positive spillover of their search for territorial efficiency, like Uber, are at the forefront of this field. These companies have been the main drivers behind the most significant developments of the last decade. A notable example is H3, a system for spatial indexing and representation. Utilizing a simple yet effective approach, H3 employs a hexagonal representation of a given terrain and its elements, offering vast calculation possibilities, optimization functions, and varying degrees of resolution based on the depth or number of elements to be visualized. Ultimately, it provides an optimal translation of the real world into the data world, enabling the extraction of valuable insights that drive informed decision-making.
GIS in Mercado Pago
After this brief overview of the state of the art in GIS within the market, we now turn our attention to Mercado Pago and its own GIS. Here, we will explain what it entails and how, from the data team, we articulate the set of Technologies, Data, and Methods that enable geo analysis.
Technology for Obtaining GEO Data
As previously mentioned, the explosion in both hardware and software in recent decades have made it possible to capture and collect vast amounts of geographical data through mobile phone signals, GPS, and WiFi. The integration of these technologies into our hardware, specifically the Point physical payment terminals, allow these devices to emit geolocation signals when used in transactions.
From the user’s side, our IT teams have also integrated the necessary software into our Mercado Pago wallet app to collect geolocation data, provided the user grants permission to share their location. This setup allows us to obtain two sources of geo data from our users in the form of latitude and longitude coordinates, linking transactions to their locations. This data is transmitted through APIs, which are interfaces that allow predefined formats, information from a database, or parts of a program to be extracted to another. In this way, valuable geo information becomes available for collection, storage, and modeling.
Yet, as mentioned earlier, the construction of a GIS is not limited to capturing geolocation information. It also requires contextualizing this data within the real world where it was generated and, more importantly, discovering uncaptured commercial opportunities geographically. How do we approach obtaining this additional data that will later be used in modeling?
These two objectives differ from the previous one in that they refer to information outside the transactional nature of our operations, therefore, may be available at varying technological acquisition costs. Specifically, we can broadly categorize this data as follows:
- Administrative Divisions
- Points of Interest (POIs) such as buildings, transportation hubs, and commercial establishments
The information we obtain from these categories should later allow us to make sense of the dynamics within our ecosystem and direct our efforts towards areas where our future customers are located. However, since this data are beyond the control of our own software developments, we will rely on the one hand, on open sources like Open Street Map (OSM) and on the other, on our lead scraping capabilities.
OSM, which started in 2004 as an open-source project, has centralized the efforts of millions of collaborators. Initially, these contributors used GPS and images to manually populate a large database aimed at mapping administrative divisions, routes, and points of interest. Over the years and with the rise of mobile phone photography and GPS services, the scale and speed of data input into OSM have dramatically increased, now featuring contributions from an estimated 10 million users and approximately 6 billion unique map points.
Querying this data provides spatial information in both vector and raster formats (such as PNG) intended for the project’s cartographic viewer. This raster cartography is generated using the PostGIS database and the Mapnik renderer, which converts raw vector data stored in PostGIS into raster format. To extract this data, one can directly access OSM or use APIs and download sites that have emerged as companies dedicated to simplifying and optimizing these queries.
Additionally, extracting data from platforms licensed under “Creative Commons” and “Open Data” allows us to enrich the contextualization of our information and uncover potential commercial hubs for expanding our operations. These tools provide precise location data, highlighted points of interest, and commercial establishments.
With these results in hand, we have the essential inputs required by a GIS:
- Geolocated behavior data of our users
- Information about the world, including its administrative divisions and points of interest, that enable us to contextualize our users and discover commercial hubs.
Transformation and Storage in GIS
After defining data sources and configuring geolocation tracking, it becomes crucial to choose a storage technology that not only manages the vast amounts of data but also supports building a robust and stable model for geospatial analysis. For this purpose, we use BigQuery. With its serverless architecture and distributed, scalable analytics engine, BigQuery enables us to query terabytes of data in seconds and petabytes in minutes.
BigQuery offers exceptional flexibility by separating the processing engine that analyzes data from the storage options, thereby optimizing the efficiency of both. When it comes to scaling, storing, modeling, and querying geospatial data, BigQuery has empowered our team at Mercado Pago to handle Big Data efficiently. This includes identifying bottlenecks in data handling and typologies, providing a broad range of geographic functions, and creating an environment that facilitates the modeling, transformation, and analysis of data. These capabilities are essential for constructing a GIS suitable for a company as large as Mercado Libre.
We will delve deeper into this analysis in the second section, where we will explore the core process of transforming Geo Big Data into a fully functional GIS.
GIS Modeling: Transforming Data into Valuable Information
As mentioned earlier, simply collecting georeferenced data alone is not enough to qualify as a GIS. It is necessary to make sense of this data meaningfully so that, by the end of the process, we have a comprehensive model that includes at least the following characteristics:
- It interrelates different geo elements in an entity diagram that provides context and meaning, enabling complex business strategy questions to be answered collectively.
- It manages the inherent imprecision of geographical data effectively.
- It offers various alternatives for data consumption, both analytically and visually.
To start, it is crucial to think in terms of an entity model, not only when analyzing georeferenced data but in any representation of a company’s reality. There are different techniques and approaches for this process. In our case, we use a robust relational modeling design, defining a minimal set of entities that, based on the previously defined origins and extraction technology, allows us to identify four main entities: most frequent user geolocation, establishment geolocation (client businesses), geolocated points of interest, and administrative levels, as shown in the diagram below:
The localization of users (with their expressed consent and the anonymization of their identities for analytics purposes) and businesses requires the integration of multiple data sources, reverse geocoding, and frequency calculations. Let’s start with our users. We must address an issue related to people’s localization; unlike other elements, people cannot be represented statically. We need to estimate their most frequent positions and be able to locate them at least on a daily basis.
Calculating this geofrequency involves taking into account a fundamental fact: people tend to move daily to different locations for their daily activities. This dynamic behavior, in turn, leads them to make decisions where the geographical variable is relevant, based not only on where they reside but also, more frequently, on where they spend most of their day. For Mercado Pago, this is the critical location to drive the value proposition for the user. Therefore, in addition to working with the declared location, we integrate and compare the distances of users who have consented to provide us with geolocation information of their activities in the Mercado Pago app.
To achieve this, data from various origins are integrated and homogenized. Some data are declared colloquially, such as addresses, while others are collected in geographic coordinates (latitude/longitude) format. These data are compared and, using BigQuery’s specialized functions, consolidated into a more probable frequency according to the temporal dimension included in the calculation. This frequency can be monthly, daily, or even increase to values approaching real-time.
On the other hand, although Mercado Pago’s client businesses generally have more stable locations, they also require a certain level of calculation beyond simple data collection. It is not uncommon to find erroneous, inaccurate, or outdated addresses. Addressing this complexity in a database that comprises millions of businesses requires a continuous process of data verification and validation, contrasting the declared information with the transactions recorded by the enabled devices. As mentioned in the technology section, our POS devices are equipped with GPS functionalities, allowing us to perform a reverse georeferencing process when receiving this data. This technique infers the geolocation based on received lat/long points and their link to the POS device owners or our physical QR codes with the same capability.
The modeling of minimal GIS entities also includes entities related to points of interest and administrative divisions. As we have mentioned, these are necessary for the contextualization of our users’ and businesses’ data and are part of the opportunities for expansion and improvement of the value proposition we offer to each of them. Again, BigQuery allows the standardization of different data sources, bringing them to standardized units, indexing them, and defining clusters according to the desired dimensionality.
The GEO Insight: From Spatial Location to Territorial Strategy
The development so far underscores the significance of geographic data for a company like Mercado Pago, highlighting the diversity of data sources and the practice of transforming and modeling this data to deliver the expected business insights.
However, within the diverse teams that drive and sustain the company daily, the consumption, exploitation, and valorization of this data delineate a clear distinction between achieving elegant and precise data modeling and a GIS capable of penetrating the core of business strategy to solve real-world problems. To accomplish this, it is essential to build a robust data consumption and intelligent exploitation capability. The “recipe” for constructing this capability includes several key premises:
- Data can be consumed directly from the entities where they are stored, with the relational model allowing for interrelated querying.
- A powerful visualization tool is available, capable of being operated by any user, based on the GIS model, enabling them to create custom maps based on various dimensions of the data to be analyzed.
- The tool has the additional capability of providing insights that go beyond mere data exploitation, offering integration with machine learning models and mathematical algorithms for routing, optimizations, etc.
The representation of the constructed GIS can be visualized in an image like the one below:
With well-modeled data in a simple yet comprehensive architecture, careful selection of tools that enhance the data, as previously mentioned, should prioritize guaranteed integration with BigQuery’s data and functions. As a best practice, it is advisable to evaluate the top-tier partners of the chosen data warehouse provider. Typically, those in partnership will ensure seamless data integration and synergistic use of each other’s capabilities and advantages.
Finally, attention should be given to developing a frontend designed with business and product users in mind:
- Clear identifiers and nomenclatures,
- Business-relevant dimensions: time and its granularity, product and customer segments,
- Relevant geographic dimensions: administrative levels, population density, sociodemographic variables, economic variables,
- Reference metrics: transaction volumes, frequency and recency measurements of transactions,
- Business intelligence functionalities: clusters, routing, optimizations.
With all these elements in place, a company like Mercado Pago, focused on increasing user numbers and enhancing transaction activity within its ecosystem, will face challenges such as:
- Defining a territorial strategy by detecting areas with potential for commercial expansion opportunities.
- Promoting the use of various virtual wallet products based on user mobility areas, thereby expanding services like transport recharge, cash withdrawal, or pickup of items purchased on the Meli app.
- Maximizing the coverage of client businesses by commercial agents, considering variables such as transportation type, distance to be covered, and area traffic.
- Advertising in strategic public locations based on the product or service to be promoted and the most suitable geographic area.
If you’ve read this far, you might be curious about how GIS managed to transform concrete strategies of Mercado Pago into effective business solutions… So stay tuned for our second part of this article, where we will present a business case showcasing the principles of GIS in action and how they positively impacted Mercado Pago’s metrics.
See you soon and please clap below if you enjoyed this reading!