Choosing Your Best Holiday Stay with Python

Krishnan Raghavan
The Startup
Published in
8 min readJan 22, 2021
Image: Bend, Oregon

If you’re like me and traveling is one of your hobbies, one important question that takes up a lot of your planning time might be: Where do I stay?
You fancy a long getaway at that alluring beach town, resort town or National Park. You’re probably going to be spending a lot of time at the beach, or exploring nature, or another similar activity.
However, you probably also have some spare time that you wish you could spend eating your favorite cuisine, or at a local bar, a quiet cafe, or maybe you just like to play a sport or work out.
This article goes over the development of an application to give you relevant information to help you determine a great stay with minimum effort. The application is developed in Python, and displays interactive maps to help make your decision.

Inputs:

  • List of places (eg. ‘Bend, Oregon’)
  • Category List (eg. ‘food’, ‘restaurant’, ‘gym’, ‘trails’, ‘school’, ‘train station’)

Outputs:

  • Interactive map of venues as per user input color coded by venue type
  • Interactive map of Hotels nearby color coded by popularity

Only a high level overview is discussed here, the implementation details can be found in the link at the end of the article.

Applications Background

This section goes over a brief description of the external applications used in the analysis, namely Nominatim to obtain location information, Foursquare to obtain venue information and Folium to plot world maps. Each of these can be imported and accessed using Python.

Foursquare

If you are unfamiliar with Foursquare, it is self described as a “location data platform”. Foursquare is a way for users to share and view information about venues around the world, making it in essence a location related social networking application.
The available information includes the venue address, tips, likes, photos and other information such as visitor statistics. More details and product offerings can be found at the Foursquare website.
One popular product offering is “Places”. This contains a database of venue information from around the world that can be downloaded using API commands as a json file. This link provides examples of API calls in different programming languages.
The type of information available depends on the account tier. The “Personal Developer” account is free and allows for up to two photos and tips per venue, as well as 99,500 “Regular” calls and 500 “Premium” calls per day.
“Regular” and “Premium” are the types of API endpoints or information that can be obtained. Users would also require authentication and version information to make API calls. Details about the endpoints, and authentication, along with other information can be found at this link.
We use the Personal Developer account for our Python application. Higher tier accounts have additional fees, but offer more features and fewer restrictions on calls and endpoints.

Nominatim

In addition to Foursquare, we also use a geolocator application called Nominatim that is part of the “geopy” geocoding project. Similar to Foursquare, Nominatim has its own API. Nominatim is used in our Python application to match an address to coordinates (latitude and longitude), which can then be used with Foursquare to look up nearby venues.

Folium

To view the location and venue information on an interactive map in Python, we use a library called “Folium”. This library allows us to generate various types of maps, centered on particular coordinates with required zoom to look at nearby venues on a world map.

Software and Libraries

This application was made with Python version 3.12.4. in a JupyterLab environment. The following standard Python libraries were used:

  • numpy — standard Python array library
  • pandas — Dataframe Python library, also heavily based on numpy
  • math — Python library to perform several math operations
  • json — Python library for JSON files
  • geopy and nominatim — Application to obtain location of addresses
  • requests — handle requests to external apps. like Foursquare
  • matplotlib — for plotting analysis graphs like box, bar, scatter etc.
  • folium — for plotting world maps

Analysis Method

This section describes the analysis used for the Python application.

Method Overview

  • The application first obtains the coordinates of the address using the geolocator Nominatim.
  • These coordinates are then input into the Foursquare API call to obtain venue information based on user interest inputs.
  • The centroid is then calculated based on the mean of the locations (latitude and longitude) of these venues
  • The venues are now determined w.r.t. the Centroid and classified into the Categories of interest.
  • Finally, hotels near the centroid are found to spend the holiday at the destination, ranked by popularity and distance from the centroid.

Method Description

The major steps of the Python application are discussed below:

  1. Get user inputs. The user inputs consist of the following:

a. places: A list of places the user is interested in exploring. For example, “Bend, Oregon”.
b. categories: A list of venue types the user is interested in. The complete list can be found at this link.
In this example, the categories correspond to the following:

  • Museum
  • Thai Restaurant
  • Gym
  • Night Market
  • National Park
  • State / Provincial Park
  • Trail
  • Airport
Figure: Input

2. Obtain and print the address coordinates (latitude and longitude). This is done using Nominatim.

Figure: Location Information

3. Obtain nearby venue information in a dataframe, by retrieving the data from Foursquare through an API call. We extract the following information: Name of the venue, a unique venue id, latitude, longitude, category, a unique id for each category. The “Distance” is the geo-distance between the venue lat/long and the lat/long of the location address in step 2.

Table: Dataframe of venue information

4. Cleanup venue data to remove NaNs and other erroneous data.

5. Calculate the Centroid of the venues. The coordinates of the centroid are simply the mean of the latitude and mean of the longitude.

6. Get a new list of venues based around the Centroid.

7. Plot a map of the venues using Folium. It will look similar to the picture and is interactive. Clicking on a marker will provide more information.

Figure: Map of venues around the Centroid (shown in red-bordered yellow)

8. Plot Box plots of the coordinates (latitude and longitude) of each type of venue to look at the distribution of location information and outliers. An example category is shown in the Figure below.

Figure: Box Plot of Gym venues around the Centroid (outliers in black circles)

9. Encode the venues into the categories of interest. This is done using the ‘Category ID’ label from the venues dataframe obtained from the API request. A simple dictionary mapping can map the ID labels to a unique category suitable for color marking.

Figure: Clustering of venues by Category (Cluster Labels)

10. Plot the clustered venues on a Folium map. Create a map object then loop through the venue dataframe, adding a marker at the latitude and longitude of each entry and the corresponding marker properties.

Figure: Venues clustered by color (Centroid in red-yellow)

11. Search for nearby Hotels to stay based on the search radius input by the user. The list is sorted first by “Likes”, then by Distance from Centroid. The closer the hotel is to the Centroid, the more it is in proximity to most of the venues of interest.

Table: Hotels sorted by likes and distance from Centroid

12. Plot the Hotels on a Folium Map to visualize them.

Figure: Hotels scored out of four by ‘Likes’

The ‘Score’ is obtained by dividing the ‘Likes’ range into four equal parts and determining which bracket the individual ‘Likes’ entry falls into. The higher the score, the more likes it got.

Concluding Remarks

Thus we have used the user inputs to obtain an interactive map of the venues as well as the optimum location to stay and a list of hotels nearby. Note that this is just an overview of the steps and the implementation link can be found at the bottom of the article.

Summary

We did the following steps in the analysis:

  • Obtain user preferences.
  • Import the required libraries.
  • Find the location of the destination address using Nominatim.
  • Find the venue list around the destination location by doing a “get” request from Foursquare. Get this as a dataframe.
  • Pre-process the dataframe to remove NaNs and other invalid values.
  • Find the centroid.
  • Find the venue list around this Centroid by doing a “get” request from Foursquare.
  • Encode and cluster the venues by category.
  • Plot them on a Folium map.
  • Find hotels nearby and sort them by “Likes” and “Distance” from centroid.
  • Show the hotels on a Folium map.

Challenges

One of the interesting aspects of creating this application was to streamline the application to run with minimum user input, while still retaining enough information to provide a satisfactory output.
Also, determining how outliers are to be handled can be a case of “it depends”. For example, a venue could be of interest even if it’s an outlier, in that case we want to consider the mean to calculate the Centroid. However, it is possible that most outliers are not of interest or similar venues can be found nearby. In that case, the Centroid can be obtained by the median.

Improvements

Some improvements to this process could be:

  • The ability to analyze multiple holiday destinations
  • Taking into account the venue “Likes” or other attributes to calculate the Centroid.
  • It can be expanded into other use cases, such as where to move into for a new home, where to set up your business, etc..
  • Make the process into a servable application

Such an application can also be used, with minimum changes, to decide the best place to move into a new destination (such as closest to schools, parks, movie theaters and fitness centers), or to make a decision about opening a new business in proximity to certain venues (such as a coffee shop near residential and commercial areas).

Further Information

For more information and code, please refer this link.

--

--