A Contextual Multi-armed Bandit Approach to Personalized Trip Itinerary Planning

Haowei Li
AI4SM
Published in
17 min readDec 17, 2023

By Haowei Li, Mufeng Wang, and Jiarui Zhang as part of the course project of ECE1724H: Bio-inspired Algorithms for Smart Mobility. Dr. Alaa Khamis, University of Toronto, 2023.

Abstract

The trip itinerary planning problem has been extensively explored in the past few years. The main objective of this problem is to aid travelers in creating routes through points of interests (POIs). In this article, we particularly focus on route personalization, taking into account aspects such as budget constraints, hotel selection, users’ preferred POI categories, the duration of the trip, and the overall route length. To address these considerations, we implement Contextual Multi-armed Bandits (CMAB), a robust methodology that adjusts to the constraints and requirements of each user. We validate the effectiveness of our approach by comparing against a baseline model in terms of user satisfaction and the time required to generate results. This article demonstrates the potential of CMAB in personalized itinerary planning.

1 Introduction

In the fast-evolving landscape of smart mobility, trip itinerary planning (TIP) emerges as a pivotal component, significantly impacting the tourism industry and, by extension, the global economy. Tourism contributed to approximately 7.6 percent of the total global GDP in 2022 [1], incorporating industries such as transportation, accommodation, and food services. The essence of TIP is its capacity to optimize travel experiences, leading to a more efficient allocation of resources and a boost in economic activities associated with tourism. However, TIP requires travellers to conduct substantial research on potential destinations. They will need to select points of interests (POIs) that are expected to maximize their satisfaction, choose an affordable hotel near these POIs, and plan an efficient route.

In the realm of trip planning tools, various platforms exhibit distinct capabilities and limitations. Tools such as TripIt [2] require users to input their preferred places before providing detailed information on these tourist spots and travel options. Travaa [3], although providing a list of popular places for a city, lacks the functionality for users to filter these options based on their personal preferences. As a result, users are compelled to manually sift through the lists and organize their travel itineraries. Expedia [4], while adept at recommending flights and hotels, cannot assist users with planning visits to specific destinations.

Solving TIP is complicated because it involves selecting POIs based on user preferences and constraints, such as travel duration and budget, as well as other factors affecting satisfaction, like weather [5]. Furthermore, selecting an optimal route through the POIs can be modelled as a vehicle routing problem [6, 7] or a traveling salesman problem [8], both of which are proven to be NP-hard [9, 10]. Therefore, an efficient method is required to search through the possible solutions.

In this paper, we explore the resolution of TIP through the Contextual Multi-armed Bandit (CMAB) model. Section 2 delves into the related work, and section 3 discusses how the problem objective and constraints are modelled. Our proposed solution is presented in section 4, and section 5 evaluates the performance of our approach. Finally, section 6 summarizes our findings and suggests avenues for future improvements.

2 Literature Review

Substantial research has been conducted to efficiently solve TIP and plan for routes. Ghobber et al. [10] modelled the problem as Team Orienteering Problem with Time Window (TOPTW) and used the partition crossover evolutionary algorithm. Jia et al. [8], modelling this problem similarly to TOPTW, implemented three bio-inspired algorithms. Rani et al. [11] explored the clustered property of POIs and incorporated K-means clustering with approaches for the Traveling Salesman Problem.

Additional work has been developed to consider user preference and external conditions such as weather when designing itineraries. Yochum et al. [12] included mandatory POIs composed of popular places or sites particular to the city. Bolzoni et al. [13] enabled clients to choose categories of POIs to visit. Neisani et al. [14] proposed a solution adaptive to climate situations at POIs. Erbil et al. [15] add personalized options such as the pace of the traveller and the diversity level of the route. Herzog et al. [16] further explored user preferences and proposed a user-oriented approach, taking into consideration the weather, time of day, and previously visited POIs.

Contextual multi-armed bandits have been commonly used in recommendation systems to maximize the expected reward. This algorithm originated from bandit machines in casinos. At each iteration the player pulls a set of K arms, and receives reward for the arms pulled; the rewards of the other arms remain unknown [17]. Context refers to the side information about the state of the environment observed by the player before making decisions. The reward for each arm changs with varying context. In each iteration, the player has to balance between trying different bandits to learn more about the expected payoff of each machine and exploiting the best bandit that they learnt about. Lu et al. [18] presented an application of this algorithm aimed at maximizing the click-through rate of advertisements, considering a query at each time step and the history of past queries and advertisement clicks. Yoon et al. [19] presented an application of contextual bandits in transit route design, focusing on minimizing the travel time.

3 Problem Formulation and Modeling

In this paper, we aim to design a personalized trip itinerary for Toronto in a multi-criteria approach. The process is modelled using three stages, respectively POI filtering, hotel selection, and route generation.

3.1 POI Filtering

Initially, POIs are filtered based on user interests. Each POI is categorized to reflect its main activity. We select POIs that align with the user’s preferences, ensuring that our recommendations are both diverse and tailored to the user’s preferences. The selected POIs are represented by pᵢ in a set 𝒫, where1 ≤ i ≤ n.

3.2 Hotel Selection

In this stage, we select a set of affordable hotels that are close to the selected POIs. To formulate this problem, we have a set 𝓅 of selected POIs, each element is represented by pᵢ with 1 ≤ i ≤ n, and a set ℋ of all hotels, each element is represented by hⱼ with 1 ≤ j ≤ m. d(pᵢ, hⱼ) represents the Haversine distance between hotel hⱼ and POI pᵢ. Our objective is to select k points from ℋsuch that the sum of distances from hⱼ to 𝓅 is minimized. Here, the decision variable xⱼ determines whether hⱼ is among the top k. xⱼ= 1 if hⱼ is one of the top k points, and xⱼ = 0 otherwise.

3.3 Route Generation

Finally, we generate routes for the travellers to visit. The user specifies the total number of days for the trip, the season, and the daily hours allocated for sightseeing. Our objective is to devise a number of distinct sub-routes that adhere to these constraints, enhancing the overall user experience. We model the destination city as an undirected graph G= (𝒫, 𝒜), where 𝒫= {p₀, p₁, …, p_n} represents POIs. The target hotel hⱼ is set up as p₀, serving as the start and end point for each sub-route. 𝒜= { d(pᵢ, pⱼ) | i,j ∈ P, i ≠ j } represents distances between POIs. We model it as a Contextual Bandits problem with a reward function. The context is defined as C = {c₁, c₂ …, c_m}, with cᵢ being a feature vector denoting the hotel location and visiting season. Actions A = {a₁, a₂ …, a_n} represent combinations of sub-routes, with aᵢ comprising a sequence of POIs. The reward function, also known as the objective function, is designed to maximize the overall user experience considering POI ratings, distances, review counts, and seasonality. The mathematical formulation is shown as follows:

where

  • xᵢ: Binary variable, 1 if POI pᵢ is included in the route, 0 otherwise.
  • yᵢⱼ: Binary variable, 1 if the route includes travel from pᵢ to pⱼ, 0 otherwise.
  • r_pᵢ, n_pᵢ: Ratings and number of reviews, respectively, for pᵢ.
  • d(pᵢ, pⱼ): Haversine distance between POIs pᵢ and pⱼ.
  • w_r, w_d, w_n, w_s: Weights for ratings, distances, review counts, and seasonality factors, respectively.
  • SA_pi: Seasonal adjustment for pᵢ.

The seasonal adjustment function is based on the compatibility of the visiting season with the type of each POI.

Where

  • Season_visit: The current visit season (e.g., spring, summer, autumn, winter).
  • Type(pᵢ): Whether the pᵢ is an indoor or outdoor attraction.

In developing our model, we considered user needs and insights from existing research. Herzog et al. [16] indicated that travellers don’t want to visit art places during the evening, since the preference score is as low as 0.42. Similarly, they found that people prefer not to attend musical events in the morning, with a preference score of 0.1.

Additionally, we observed that museums do not operate during nighttime, whereas performing arts places typically do not open in the morning. Thus, we set a constraint that museums are not to be visited during nighttime, and performing arts venues are not to be visited in the morning.

Furthermore, Herzog’s work indicates that consecutive visits to outdoor places receive a preference score of 0.76, which is rather low compared to other pairs (usually around 1.45). Accordingly, we incorporated a constraint to prevent the occurrence of consecutive outdoor places. The finalized constraints, derived from these considerations, are detailed below.

Each sub-route must start and end at the hotel p₀:

Each POI, except the hotel p₀, is visited at most once:

Operating hours constraints:

Avoid consecutive outdoor POIs:

Outdoor(pᵢ) is a binary variable indicating whether pᵢ is an outdoor location.

4 Proposed Solution

Our solution is composed of three stages. As introduced in the previous section, they are respectively filtering POIs based on user preferences, selecting a set of potential hotels, and generating routes for all hotels selected. We recommend the route with the highest reward value to the users. In this section, we detail the process of choosing hotels, outline a methodology for approximating user satisfaction with a given route, and discuss the implementation of exploration algorithms in contextual multi-armed bandits. This approach integrates both spatial elements of the itinerary and dynamic aspects of user preferences and satisfaction. A detailed view of our methodologies is available in the Jupyter Notebook.

4.1 Hotel Selection

The hotel selection process is based on the following procedure. First, we filter the hotel class based on the user’s budget. Next, we cluster the POIs that the users are potentially interested in using k-means. Lastly, we find the cluster with the highest density and choose the top three hotels that are close to the centers of the highest-density cluster.

4.2 Reward Estimation

To estimate a numerical reward value for a given route, we incorporate the factors of haversine distance of the route, season of travel, review ratings, and review counts. Review counts are included to evaluate the credibility of review ratings. In addition, we use the average of review ratings in all POIs visited to mitigate the drawbacks of visiting POIs for a longer duration. The values of each factor are normalized to a value between 0 and 1, and the weighted sum of the normalized values is used as the reward. The Users have the flexibility to set these weights according to their preferences, allowing for a customizable approach.

4.3 Contextual Bandit algorithms

Our project implemented contextual bandits using Vowpal Wabbit [20]. During each round of training, the algorithm must decide between evaluating the reward of methods that have historically yielded good results, and exploring the reward of new methods that it has little information about [21]. This section discusses the three algorithms we used to balance the exploitation-exploration trade-off.

4.3.1 Explore First

This strategy takes as input a parameter tau that specifies the duration of the exploration phase. In the first tau trials, it tries out different actions with equal probability. This phase allows the algorithm to gather information about the environment and the effectiveness of various actions without any bias. Subsequently, the algorithm transits to the exploiting strategy, only selecting the action that appeared to be the best result during the exploration phase.

4.3.2 Epsilon Greedy

This strategy chooses an action at random with probability ϵ, and chooses the action with the highest estimated value with probability 1-ϵ. A higher value of ϵ indicates that the algorithm will engage in more exploration, which is useful in scenarios with significant uncertainty in the rewards of actions. Conversely, a lower ϵ value leads to more exploitation, which is used when the understanding of the reward values associated with different actions is more precise.

4.3.3 Softmax Explorer

The softmax explorer chooses the action by predicting a score that estimates the potential award. The prediction is based on a machine learning algorithm and previous information. Afterwards, it generates a distribution corresponding to the softmax function exp(lambda * score(x,a)) over all possible actions [22]. When the parameter lambda is set to 0, a uniform exploration is generated, and when lambda approaches infinity, the algorithm chooses the action with the highest predicted reward. This algorithm is suitable when the number of possible actions is large, or in dynamic environments where the conditions change over time.

5 Performance Evaluation

This section presents the datasets we used, the experiment setup, and quantitative and qualitative analysis of the testing results.

5.1 Dataset

We obtained the POI dataset from the City of Toronto’s website [23], which includes the names and addresses of 173 POIs in 13 categories, such as museums, sports, and landmarks. Afterwards, we developed a web scraping script to acquire user review data from TripAdvisor [24]. The script extracted data on POI ratings, review counts, and recommended visit durations. Next we run the script again to collect data on Toronto hotels, encompassing their names, addresses, and class. In the data preprocessing phase, the addresses of POIs and hotels were converted into coordinates using the geopy library. The figures below illustrate the distribution of hotels and POIs in our dataset.

Figure 1: Hotel density distribution
Figure 2: Geographic distribution of POIs in four selected categories

5.2 Experiment Setup

We compare the strategies in contextual multi-armed bandits with the baseline model using parameters described below.

Explore First will be tested with parameter tau corresponding to proportions of 1/10 and 1/3 of the solution set size, to ascertain how different levels of initial exploration impact the performance.

Epsilon Greedy is selected for its tunability. By setting epsilon values to 0.1, 0.3 and 0.7, we aim to observe the effects of varying degrees of exploration on the model’s ability to maximize rewards over iterations.

Softmax Explorer is used with a lambda value of 10. We explore how this approach fares using a probabilistic exploration based on estimated values.

We varied the numbers of tours by setting it to 1 and 2 to observe how the system behaves with different sizes of training data. By increasing the number of tours, we effectively expanded the dataset, providing insights into how the model performs with larger volumes of information. For consistency across trials and to reflect a realistic preference range for travellers, we set the hotel class parameter to 3.5. This choice aligns with the preferences of travelers to Toronto, as indicated by the average accommodation spending reported in a study by budgetyourtrip [25].

To evaluate the effectiveness of our CMAB model, we also conducted a benchmarking exercise against the Adaptive Genetic Algorithm (AGA) [26], a well-established method in route optimization and pathfinding. This algorithm is configured with a consistent set of constraints for routing, such as the number of days allocated for the trip. The start and end location of this algorithm is set as the coordinates of the selected hotel. For a direct and fair comparison, all algorithms are run on the same dataset and the same machine, using 2.20 GHz Intel Xeon CPU [27] and 51 GB RAM on Google Colab. The key aspects of the comparison include assessing the reward value and the run time for each iteration of route generation. The testing result is shown in Table 1. The tour number means the number of days for travel, and the best models for the two setups are bolded.

Table 1: Test results

5.3 Quantitative Analysis

The explore-first approach, with a parameter corresponding to one-tenth of the dataset size, delivered the highest rewards for single-day tours. This indicates that an initial exploration phase can be highly effective in simple scenarios. Increasing the exploration size resulted in a decrease in reward, suggesting that the algorithm spent too long in exploring suboptimal solutions. Another reason could be that the valuable information obtained during the exploitation phase got diluted afterwards.

For more complex situations like two-day tours, the softmax-explorer with a lambda value of 10 emerged as the most effective. It suggests that compared to the simpler algorithms, the probabilistic method in the softmax-explorer is advantageous in tasks requiring nuanced decision making over a large set of actions. In the epsilon-greedy approach, the epsilon value of 0.1 yielded the best outcome in both experimental setups. This epsilon configuration produced a reward pattern that closely paralleled the softmax-explorer in the two-tour setup, indicating that an epsilon value of 0.1 effectively balances between exploration and exploitation within this specific context.

AGA has the advantage of a more consistent runtime compared to CMAB models. The number of available actions grows exponentially as the tour number increases, whereas the run time of AGA only doubled.

5.4 Qualitative Analysis

CMAB has demonstrated considerable advantage over AGA in route quality for all tested scenarios. The superior performance is due to AGA’s focus on selecting POIs based on their ratings, neglecting other factors such as the total distance of the route. This resulted in less efficient tour paths characterized by spikes in the route. As depicted in Figure 3, the path generated by AGA, although containing POIs with the highest ratings, requires the users to travel a long distance to the east to visit Rosetta McClain Gardens, and return back to downtown. In contrast, the route produced by CMAB shown in Figure 4 prioritizes visiting POIs that are in closer proximity to each other and in a streamlined order, thereby decreasing the travel time and allowing for an extended visit duration at the POIs.

Figure 3: Tours generated by AGA
Figure 4: Tours generated by CMAB

6 Conclusions and Recommendations

Our exploration into TIP using the Contextual Multi-armed Bandit framework has yielded a promising approach in generating personalizing travel itineraries. By integrating a multi-criteria perspective that encompasses not just attractions but also accommodations and travel routes, we’ve developed a model that aligns closely with a traveller’s individual preferences.

The implementation of our model posed a complex challenge, especially in designing a reward function that skillfully blends factors such as POI ratings, travel distances, review counts, and seasonal considerations. Despite these complexities, our model achieved a commendable performance, surpassing the adaptive genetic algorithm in both optimality and computational efficiency, as evidenced by our experimental results.

Looking ahead, several potential enhancements to our model could further improve its utility. Incorporating user feedback into the personalization algorithm is likely to substantially elevate the quality of our recommendations. Additionally, the integration of real-time data streams, such as traffic updates and weather conditions, could enable dynamic itinerary adjustments, thereby increasing the model’s responsiveness to unforeseen changes.

To summarize, our project demonstrates the effective application of the Contextual Multi-Armed Bandit model in TIP. The results we have achieved not only proved the model’s proficiency in enhancing the travel planning process but also showed its potential to reshape personalized tourism experiences. The success of our approach in Toronto’s context sets a precedent for broader applications, suggesting that the CMAB model holds substantial promise for smart travel solutions. As we conclude, our project underscores the effectiveness of Contextual Multi-armed Bandits in TIP, laying a foundation for its future use in travel and tourism optimization.

7 Colab Notebook

References

[1] “Topic: Tourism worldwide,” Statista, Dec. 12, 2023. https://www.statista.com/topics/962/global-tourism/#topicOverview

[2] TripIt — Highest-rated trip planner and flight tracker. https://www.tripit.com/web

[3] Travaa International Limited, “Travel Itinerary Planner | TraVaA.” https://travaa.com/

[4] Travel with Expedia.ca: Vacation Homes, Hotels, Car Rentals, Flights & More. Expedia.ca. https://www.expedia.ca/

[5] J. M. Hamilton and M. A. Lau, “The Role of Climate Information in Tourist Destination Choice Decision Making,” Routledge eBooks, pp. 243–264, doi: 10.4324/9780203011911–26.

[6] G. Laporte, “The vehicle routing problem: An overview of exact and approximate algorithms,” European Journal of Operational Research, vol. 59, no. 3, pp. 345–358, Jun. 1992, doi: 10.1016/0377–2217(92)90192-c.

[7] J. Du, L. Li, and X. Li, “Travel planning problem considering site selection and itinerary making,” in Proceedings of the 2018 Conference on Research in Adaptive and Convergent Systems, Oct. 2018, doi: 10.1145/3264746.3264781.

[8] J. Jia, Y. Chen, Y. Liu, and A. Khamis, “Trip Itinerary Planning: A Bio-inspired Metaheuristic Approach,” in Proceedings of the 2022 IEEE International Conference on Smart Mobility (SM), Mar. 2022, doi: 10.1109/sm55505.2022.9758204.

[9] N. Christofides, A. Mingozzi, and P. Toth, “Exact algorithms for the vehicle routing problem, based on spanning tree and shortest path relaxations,” Mathematical Programming, vol. 20, no. 1, pp. 255–282, Dec. 1981, doi: 10.1007/bf01589353.

[10] I. Ghobber, T. Tlili, and S. Krichen, “Partition Crossover Evolutionary Algorithm for the Team Orienteering Problem with Time Windows,” in Lecture Notes in Computer Science, 2018, pp. 200–211. doi: 10.1007/978–3–319–92058–0_19.

[11] S. Rani, K. N. Kholidah, and S. N. Huda, “A Development of Travel Itinerary Planning Application using Traveling Salesman Problem and K-Means Clustering Approach,” Proceedings of the 2018 7th International Conference on Software and Computer Applications, Feb. 2018, doi: 10.1145/3185089.3185142.

[12] P. Yochum, L. Chang, T. Gu, M. Zhu, and H. Chen, “A Genetic Algorithm for Travel Itinerary Recommendation with Mandatory Points-of-Interest,” in IFIP advances in information and communication technology, 2020, pp. 133–145. doi: 10.1007/978–3–030–46931–3_13.

[13] P. Bolzoni, S. Helmer, K. Wellenzohn, J. Gamper, and P. Andritsos, “Efficient itinerary planning with category constraints,” Proceedings of the 22nd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, Nov. 2014, doi: 10.1145/2666310.2666411.

[14] Z. Neysani, M. T. Azad, and N. N. Samany, “Design and Implementation of a User-Centered Tourism Guide System based on Weather Conditions Preferences,” 2nd International Conference on Education, Management and Information Technology (ICEMIT 2015), Oct. 2015, [Online]. Available: https://www.researchgate.net/publication/337032512_Design_and_Implementation_of_a_User-Centered_Tourism_Guide_System_based_on_Weather_Conditions_Preferences

[15] E. Erbil and W. Wörndl, “Personalization of multi-day round trip itineraries according to travelers’ preferences,” in Springer eBooks, 2022, pp. 187–199. doi: 10.1007/978–3–030–94751–4_17.

[16] D. A. Herzog, “A User-Centered approach to solving the tourist trip design problem for individuals and groups,” Ph.D. dissertation, Faculty of Computer Science, Technical University of Munich, 2020. [online]. Available: https://mediatum.ub.tum.de/doc/1540474/ip4qrko0np5s5cldahujhuntt.herzog-daniel-andreas.pdf

[17] L. Zhou, “A survey on Contextual multi-armed bandits,” arXiv (Cornell University), Aug. 2015, [Online]. Available: https://arxiv.org/pdf/1508.03326

[18] T. Lu, D. Pál, and M. Pál, “Contextual Multi-Armed bandits,” International Conference on Artificial Intelligence and Statistics, pp. 485–492, Mar. 2010, [Online]. Available: http://proceedings.mlr.press/v9/lu10a/lu10a.pdf

[19] G. Yoon and J. Y. J. Chow, “Contextual Bandit-Based Sequential Transit Route Design under Demand Uncertainty,” Transportation Research Record, vol. 2674, no. 5, pp. 613–625, May 2020, doi: 10.1177/0361198120917388.

[20] “Contextual Bandits — VowpalWabbit latest documentation.” https://vowpalwabbit.org/docs/vowpal_wabbit/python/latest/tutorials/python_Contextual_bandits_and_Vowpal_Wabbit.html

[21] VowpalWabbit, “Contextual Bandit algorithms,” GitHub. https://github.com/VowpalWabbit/vowpal_wabbit/wiki/Contextual-Bandit-algorithms

[22] A. A. Zadjali, “Using Contextual Bandits to Improve Traffic Performance in Edge Network,” Ph.D. dissertation, Department of Computer Science and Engineering, University of Nebraska-Lincoln, 2021. [online]. Available: https://digitalcommons.unl.edu/cgi/viewcontent.cgi?article=1238&context=computerscidiss

[23] “Open Data,” City of Toronto Open Data Portal. https://open.toronto.ca/

[24] Tripadvisor: Over a billion reviews & contributions for Hotels, Attractions, Restaurants, and more. https://www.tripadvisor.ca/

[25] “Toronto Travel Cost — Average Price of a Vacation to Toronto: Food & Meal Budget, Daily & Weekly Expenses | BudgetYourTrip.com,” Budget Your Trip, [Online]. Available: https://www.budgetyourtrip.com/canada/toronto

[26] “Trip Itinerary Planning — AI search algorithms for smart mobility.” https://smartmobilityalgorithms.github.io/book/content/PeopleMobilityProblems/trip_itinerary_planning/TIP.html

[27] T. Monteiro, “What is Google Colab?,” gHacks Technology News, Mar. 10, 2023. https://www.ghacks.net/2023/03/10/what-is-google-colab/

--

--