Where in the World: Location-Based Search
Although the Internet has in many ways shrunk the world into a global village, culture is still closely linked to geography. Real-world places are central to human experience and identity, and continue to hold profound associations and meaning for us. Discourse varies widely by country and by region. It is no surprise that comparing and analyzing trends and conversations by location reveals important cultural and demographic insights.
When gathering data from online platforms, we often want to “filter by location” — that is, restrict our search to a city, region or country. In this article, we discuss the problem of location filtering for online data. We focus on a scenario where a creator platform (e.g. Twitter or Youtube) holds user-generated content tagged with location metadata and is queried for content that conforms to some location criterion. There are three interrelated considerations:
When a content item is created on a platform, the platform captures its origin location either automatically or via manual input. Automatic capture uses either the “home location” of the user or the location from which the content item was posted (if the user’s device shares this information); manual input allows the user to specify a “geo-tag”. Most platforms including Twitter and Instagram capture location automatically but also allow the user to specify it manually. Youtube and Tiktok support manual geo-tagging.
In order to represent this origin location, platforms use two broad methods which we will call the “place” method and the “point” method. The “place” method specifies a general location associated with the content item, either as a text string or as a polygon that frames the location using latitude/longitude coordinates. Of course, there are always several place descriptors associated with a given location — Paris, France, and the Champ de Mars are all valid place descriptors for the Eiffel Tower. The platform typically stores a mapping of such overlapping place descriptors (e.g. Paris is in France, France is in Europe) and either stores all possible descriptor strings in the location tag, or uses the mapping at query time to ensure that content tagged with ‘Paris’ is also returned for a search for ‘France’.
The “point” method, on the other hand, specifies the exact location associated with the content item as a single latitude/longitude pair. While this is a more precise and compact representation, it usually requires the creator to use a GPS-enabled device that returns an exact GPS location.
Search queries submitted to content platforms can, in general, specify a location criterion using a country code, a place string, a point-radius scheme, or a bounding-box scheme. Country codes and place strings can be used to filter via straightforward lookup when the representation uses the place method. The point-radius scheme is useful for specifying arbitrary circular regions, but we note that insightful searches often use country- or region-wise filters — and a moment’s consideration shows that approximating countries by large circular regions is limiting at best and entirely misleading at worst (consider the shape of countries such as Chile or Vietnam). Bounding boxes can be a better fit for country shapes; in fact, compilations of country bounding boxes are widely available in the public domain. There is even a Python library to facilitate working with country coordinates and bounding boxes.
The Twitter API offers all of the above methods for API queries: country code, place string, point-radius, and bounding-box. Other APIs such as Pytrends and the Youtube API are more limited. Youtube offers only the point-radius method limited to a maximum radius of 1000 km — this does not actually span many large countries and, as stated above, introduces inaccuracies when querying country-wise. City-level filtering is, however, reasonably accurate when using a point-radius scheme. Pytrends — the unofficial Google Trends API — allows an API query to specify a country code but allows neither place strings nor arbitrary point-radius/bounding-box regions.
As a common workaround to the complexities of location support, many content creators use manual hashtag annotations to specify location. This has the advantage of allowing arbitrarily precise location descriptions (e.g. #johnsbakery). Most social platforms use hashtags to funnel search queries, allowing targeted searches to easily return tagged content. A query can, as a proxy for an actual location, request for all content that uses a location hashtag.
While support for location and location-based queries varies significantly across content creation platforms, it remains vital to consider the location factor when analyzing online data. Interests and concerns are shaped by physical environments and proximity; an analysis that does not take location into account may miss valuable insights.
At Quilt.AI, we use location-based search and machine learning to uncover cultural meaning in Internet data. Reach out to us at email@example.com for more information!