From Traditional Recommender Systems to Generative AI: Redefining Personalized Recommendations
Introduction
Machine learning in marketing enhances product and service discovery for consumers while playing a crucial role in digital strategies for e-commerce platforms. When shopping online, recommendation systems guide users towards products they are most likely to purchase. By tailoring suggestions to users’ preferences and purchase history, recommendation systems function as skilled salespeople, improving customer satisfaction and boosting profits. Recommender systems are applied in numerous areas, including:
- E-commerce suggesting products based on past purchases or browsing history.
- Streaming services: recommending movies, shows or music.
- Social media: promoting relevant content or connections.
The Rise of Generative AI in Data Marketing
In recent years, generative AI has transformed many industries, and data marketing is no exception. Gen AI, particularly large language models (LLMs), offers significant opportunities for brands to understand and connect with their customers in a more personalized and impactful way. By leveraging Gen AI’s ability to process and generate insights from vast amounts of data, marketers can optimize every stage of the customer journey — from initial engagement to loyalty-building strategies.
Unlocking New Possibilities
At the heart of this transformation lies the potential of Gen AI to enhance client knowledge. Businesses can not only gain deeper insights into their current customers but also anticipate future needs, preferences, and behaviors, thereby staying ahead of market trends.
What This Article Will Cover
This article first presents the added value of using Gen AI over traditional models in a recommender system context. Then, a use case example is provided to compare the performances of these two approaches.
Generative AI: Balancing Added Value with Risks and Limitations
Advantages of Generative AI in Recommender Systems
In our use case, Generative AI — and more precisely LLMs — offer added value by providing better contextual understanding, thereby enhancing the quality of the given recommendation. Traditional recommender systems, such as collaborative filtering, often struggle to make connections between movies that share deeper characteristics beyond basic user-item interactions. By contrast, LLMs can not only, interpret explicit information (e.g., ratings and movie titles) but also, the surrounding context including plot summaries, user reviews, director styles, or genre tags. This feature enables them to recommend movies with similar storytelling techniques or themes[1].
Collaborative filtering is particularly challenged by cold-start problems and sparse datasets, where user interactions are limited[2]. LLMs help mitigate these issues by leveraging vast knowledge from pretraining and external information, allowing for accurate recommendations even with minimal user data.
Despite these advantages, the use of Gen AI raises important considerations regarding potential risks and inherent limitations.
Challenges
- Ethical and Privacy Concerns: Recommender systems frequently handle sensitive customer data, such as purchase history, browsing behavior, and demographic information. This raises ethical concerns about data misuse and consumer privacy when leveraging Generative AI. The reliance on pre-trained LLMs often requires companies to use external APIs, which may expose sensitive data to third-party servers, heightening privacy risks.
- Hallucination & Risks: LLMs are known to “hallucinate,” generating outputs that appear plausible but are factually incorrect or nonsensical. In the context of recommender system, this may result in suggestions for nonexistent products or unavailable services. Such errors not only confuse users but may also harm a company’s reputation[3].
- Bias and Fairness Issues: Like most machine learning models, LLMs inherit biases from their training data. When used for recommendations, these biases can lead to discriminatory or unethical outputs — for example, disproportionately recommending certain products or services to specific demographic groups or reinforcing stereotypes. Mitigating these biases is particularly challenging since pre-trained models often lack transparency regarding how decisions are made[4].
Limitations
- Scalability and Cost: The cost of using LLMs as a recommender system can be significant, raising questions about scalability and long-term integration. For instance, gpt-4o costs 2.50$ per million input tokens[5]. Depending on the prompt and the amount of client information provided to the model, a single recommendation might require 2,000 tokens. When scaled across a company’s entire user base, the cost can quickly outweigh the benefits of deploying an LLM.
- Generic Recommendations: LLMs trained on large, diverse datasets may overfit general trends, resulting in overly generic recommendations . For example, they may prioritize popular or widely available products instead of highlighting “niche” or high-margin items that hold greater strategic value for the business. This limitation can reduce their effectiveness in delivering highly personalized, context-specific recommendations.
Use Case Example: Recommender system for Anime Movies
The aim of this section is to compare the outcomes achieved using traditional recommender systems (collaborative filtering) versus those generated by Gen AI. The dataset used comes from Kaggle and contains information on user preferences data from 73,516 users on 12,294 anime[6]. Each user can add anime to their completed list, give it a rating. Ultimately, this dataset is a compilation of all these information .
In our use case, we applied Neural Collaborative Filtering model as a traditional recommender system and GPT 4-o for the Gen AI part.
Methodology to Evaluate the Recommender Systems’ Accuracy
To evaluate the accuracy of recommender systems, several assessment rules were established.
A “Good” recommendation is one that meets at least one of the following conditions:
- The recommended movie is already on the user’s “plan-to-watch” list
- The movie is in the top 10% of most popular or most planned-to-watch anime by users
- The movie has a similar genre to another movie the user has already watched.
On the other hand, a “Bad” recommendation occurs when the system suggests a movie that is in the user’s “dropped” list or a movie that does not satisfy any of the conditions mentioned above.
This framework ensures that recommendations are not only relevant but also aligned with the user’s preferences and viewing history.
Results & Interpretation
The results represent the average outcomes across three randomly selected samples, each consisting of 100 users. On average, the LLM (gpt-4o) achieved an accuracy of 82.7%, while the collaborative filtering model reached 60.7%, representing an improvement of over 36%.
GPT-4o significantly outperformed the collaborative filtering approach. However, this result comes with important nuances that must be considered. The metric used to determine the quality of recommendations inherently favors the LLM, particularly due to the second evaluation condition. As previously mentioned, the diverse training datasets of GPT-4o often leads to more generic recommendations, which are more likely to fall into the “most popular” category. This tendency inflates the model’s performance under the given metric. When accuracy is evaluated solely based on the first condition, where a recommendation is considered good only if it falls within the user’s “plan-to-watch” list, the LLM’s performance drops significantly, achieving an accuracy of 21%. While this is still an improvement over the collaborative filtering model, which averages around 10%, it highlights the critical role of metrics in evaluating the performance of LLMs in recommendation systems.
The performance of the LLM is heavily dependent on the prompt. The clearer the prompt defines the task, the more likely the model is to generate a relevant response. Several iterations were necessary to improve the quality of the generated recommendations.
Conclusion
Both traditional recommender systems and generative AI have their strengths in enhancing user experiences and driving business outcomes. Traditional systems excel at delivering reliable, data-driven recommendations based on historical patterns or other users’ preferences. However, LLMs’ contextual understanding allows a deeper grasp of user intent and nuanced preferences, yielding better recommendations in this use case.
Generative AI presents exciting opportunities; however, it also comes with challenges, such as ethical considerations (including risks to consumer privacy when sensitive data is processed through external APIs, exposing it to third-party servers) and computational demands. Additionally, the cost of using LLMs, such as GPT-4o, raises concerns about scalability, with token-based pricing making large-scale deployments potentially expensive for businesses. These factors underscore the importance of carefully balancing the potential benefits of generative AI with its associated risks.
References
[1] A Reality check of the benefits of LLM in business https://arxiv.org/pdf/2406.10249
[2] The Cold Start Problem for Recommender Systems | by Mark Milankovich | Medium
[3] Hallucination is Inevitable: https://arxiv.org/pdf/2401.11817
[4] Deconstructing the Ethics of Large Language Modelshttps://arxiv.org/abs/2406.05392
[5] Gpt 4o pricing https://openai.com/api/pricing/
[6] https://www.kaggle.com/datasets/CooperUnion/anime-recommendations-database

