A Guide to Develop Similar Product Recommendation for Cosmetics Products

A prototype similar product recommendation engine for cosmetics developed in Python.

Published in

The Beta Labs Blog

7 min readJan 12, 2024

Similar product recommendations are essential in the cosmetics industry, aiding customers in discovering new products and enhancing their shopping experience. By suggesting relevant alternatives based on product attributes and customer preferences, these recommendations promote exploration and increase customer satisfaction.

This article is a step-by-step guide to create a simple similar product engine for cosmetics product. You can find all the code in this repository.

Similar Product for Cosmetics

To start with, let us ascertain the definition of “similar” within the cosmetics domain. A similar products refer to a product that shares common characteristics or attributes with another product, making them comparable or interchangeable to some extent. In the context of cosmetics, similarity can be established based on factors such as product category (e.g., lipsticks, face serums, shampoos), intended purpose (e.g., makeup or moisturising), ingredients (e.g., AHA, Vitamin E, Retinol), and effects (e.g., wrinkle reduction, hair enhancement).

The objective of identifying similar products is to provide customers with recommendations that align with their interests and preferences. Acquiring prior knowledge on customer preferences can be beneficial in determining the business requirements developing effective similar product recommendation.

For the sake of simplicity, we will make the following assumptions:

Similar products are within the same class or category. For example, when comparing lipsticks, the comparison is made among all available lipsticks.
Similar products within a certain class share similar ingredients. For instance, perfumes within the same olfactory family could be considered similar.
All recommended products must be currently in stock and available for purchase.

In certain business contexts, specific requirements may exist regarding product similarity. For instance,

Recommendations based on products belonging to the same brand or similar brand groups.
Maintain similarity in price points to discourage customers from solely opting for the cheapest option.

Methodology

The outline of the methodology:

Get Product Data: Including features such as categories, product descriptions and ingredients.
Embedding Function and Similarity Function Definition: Define suitable embedding and similarity functions to effectively capture the similarity between products based on their features.
Vectorstore: The product details and corresponding embeddings are incorporated into a vectorstore, facilitating efficient similarity computations.
Business Filtering: The implementation of business-specific filtering mechanisms to ensure that the recommended similar products comply with specific business requirements.
Get Recommendation: Retrieve the top N most similar product using product ID.

For the purpose of demonstration, dummy data is used to showcase the functionality and effectiveness of the similar product engine.

Code Implementation Walkthrough

Load data

Let’s load the product data. Below is the schema for reference:

root
 |-- product_id: string 
 |-- product_name: string
 |-- brand: string
 |-- category: string
 |-- class: string
 |-- subclass: string
 |-- product_description: string
 |-- stock_level: int

Feature Engineering

If your data already contains features that can accurately describes the product in term of determining the similarity between them, you can skip this part.

In the context of cosmetics, important features to note are ingredients, functions and scents etc. Regular expressions can be used to extract these features from the product description.

In this example, I have extracted fragrance notes for perfume products and store them in an array. Here is the updated schema after extracting features:

root
 |-- product_id: string
 |-- product_name: string
 |-- brand: string
 |-- category: string
 |-- class: string
 |-- subclass: string
 |-- product_description: string
 |-- stock_level: int
 |-- features: array
 |    |-- element: string

Vectorstore

A vectorstore serves the purpose of storing embedded data and performing vector search. They offer additional functionalities for efficient and rapid retrieval of nearest neighbours within the N-dimensional space. Vector databases have been around for quite some time, but they have become more popular recently, especially for similarity searches in Retrieval-Augmented Generation (RAG) with Large Language Models (LLMs). To make things easier, I have chosen to use Chroma, a tool that is both user-friendly and free to use.

Create and configure a chroma collection, choose a embedding function and distance function.

import chromadb
from chromadb.utils import embedding_functions

collection_name = "cosmetics_similar_product_db"
# choose a embedding function
embedding_function = embedding_functions.SentenceTransformerEmbeddingFunction(normalize_embeddings=True)

# choose a distance function
distance_function = "ip"

# create collection
chroma_client = chromadb.Client()
collection = chroma_client.create_collection(
    name=collection_name,
    embedding_function=embedding_function,
    metadata={"hnsw:space": distance_function}
)

The next step is to embed all the features and then add them to the vectorstore for further processing.

Embeddings

In this example, the features are represented as lists of words like this, here is the olfactory notes of a perfume.

["Neroli", "Jasmine Sambac", "Magnolia"]

To embed bags of words, we can proceed with the following steps:

Create word frequency vectors using the features column.
Normalise the vectors.

from sklearn.preprocessing import MultiLabelBinarizer

# extract the complete vocabulary of unique words
vocabulary = set(word for sublist in df["features"] for word in sublist)

# initialize MultiLabelBinarizer
mlb = MultiLabelBinarizer(classes=list(vocabulary))

# fit and transform the 'features' column
word_freq_vectors = mlb.fit_transform(df['features'])
words = mlb.classes_

# normalize the vector
normalized_word_freq_array = word_freq_vectors / np.linalg.norm(word_freq_vectors, axis=1, keepdims=True)

Once the vectors have been generated and normalised, they can be added to the vectorstore. Make sure to adding the necessary product information to the metadata, if metadata filtering is needed.

# add embedding to vectorstore
collection.add(
    embeddings=normalized_word_freq_vectors,
    metadatas=metadatas,
    ids=product_ids,
)

An alternative approach would be to utilise the product description itself instead of undergoing the feature extraction process. If the product descriptions are able to do the job, we can bypass the effort required for feature extraction. The input product descriptions will be embedded using the embedding function of the vectorstore.

# add documents to vectorstore
collection.add(
    documents=product_descriptions,
    metadatas=metadatas,
    ids=product_ids,
)

Metadata filters & Query

A great advantage of the vectorstore is its metadata filter feature, which is very useful for business requirement filtering. Lets set up the “same class” and “in stock” filtering criteria in the where argument for query.

# query
# the filters are input in the `where` argument
results = collection.query(
    query_embeddings=query_embeddings,
    n_results=10, # get top 10 result
    # filters used: same class and stock level > 0
    where={
        "$and": [
            {
                "class": {
                    "$eq": query_class
                },
            },
            {
                "stock_level": {
                    "$gt": 0
                }
            },
        ]
    }  
)

Sample Result

For Product AKW566, the top 10 most similar items are:
product_id  product_name                                      class     subclass       stock_level   brand              distance
0   AKW566  Sugar Peach Hydrating Lip Balm Limited Edition    Lip Care  Lip Balm          33         FRESH              0.000
1   AKW568  Sugar Chocolate Hydrating Lip Balm Limited Edi... Lip Care  Lip Balm          89         FRESH              0.070
2   AKW565  Sugar Lemon Hydrating Lip Balm Limited Edition    Lip Care  Lip Balm          28         FRESH              0.076
3   AKW569  Sugar Coconut Hydrating Lip Balm                  Lip Care  Lip Balm          60         FRESH              0.089
4   AJB605  Sugar Lip Caramel Hydrating Balm                  Lip Care  Lip Balm          1          FRESH              0.164
5   AKW563  Sugar Dream Lip Treatment Advanced Therapy        Lip Care  Lip Treatment     99         FRESH              0.272
6   AJB603  Sugar Cream Lip Treatment – Baby                  Lip Care  Lip Treatment     42         FRESH              0.273
7   AJB609  Sugar Cream Lip Treatment – Gilt                  Lip Care  Lip Treatment     5          FRESH              0.276
8   AJB602  Sugar Cream Lip Treatment – Pearl                 Lip Care  Lip Treatment     13         FRESH              0.278
9   ALB647  Rose Petal Lip Balm 4.4g                          Lip Care  Lip Balm          47         SUBTLE ENERGIES    0.298 
10  AKX969  Lip Exfoliator                                    Lip Care  Lip Treatment     27         TOM FORD           0.298

All the above functions are put into a class SimilarProductVectorDB. Check out the demo notebook for more details.

Use Cases

Here are some use cases where we can leverage the similar recommendations to the business.

Enhancing Customer Experience on e-commerce websites / apps: By incorporating similar recommendations into the website or app, businesses can provide similar product suggestions to customers that align with their interests. This can potentially lead to increased engagement, longer browsing sessions, and ultimately, higher conversion rates.

Betty is browsing an online cosmetic website looking for a new foundation. She comes across a product that catches her eye, but she is unsure if it will provide the desired level of coverage she is looking for. The website suggests other foundations that have similar coverage levels, provide Betty with alternative options to consider, helping to find a foundation that better aligns with her preferences.

In-store selling: By providing access to similar product recommendations, staff members can quickly identify suitable alternative products to suggest to customers. This helps to improve the customer experience and potentially increase sales.

Natalie enters a physical cosmetics store in search of a specific lipstick that she saw in a magazine. However, the sales associate informs her that the lipstick is currently out of stock. Leveraging an in-store clienteling system, the sales associate is able to access similar products of that particular lipstick, and suggests alternative lipsticks with similar colours, finishes, and undertones that are currently available in the store. It enhances Natalie’s shopping experience and increases the likelihood of a successful sale.

Conclusion

In summary, this article presented a prototype similar product recommendation engine for cosmetics developed in Python. The engine has great potential for adding more features based on different business needs and across diverse domains.

It is worth noting that similar product recommendations can be achieved without needing customer data and can be implemented easily. This approach can serve as an initial step before progressing to more advanced and personalised recommendation systems.

Feel free to reach out on LinkedIn, GitHub, and Medium 🚀