A foundation of great internal search engine: Product data

Product data is a keystone to a smooth, efficient and customer-friendly search experience.

Benjamin BRUNELIERE
Norauto International
7 min readSep 25, 2023

--

As time goes by, the offer available on e-commerce websites continues to grow and expand. In the majority of cases, there is no way for visitors to see all the products and services displayed, even more for marketplaces that can sometimes sell more than a few million products. Evolving at the same time as this observation, the internal search engine has become an increasingly important part of the customer’s online journey, with the aim of showing customers what they want to see, responding to a desire or need. In this way, the internal search engine is based on a foundation: product data.

Key factors

The internal search engine relies on qualitative data that is expected to be consistent, relevant, accurate, complete, standardized and up to date.
Indeed, the data that meets all these challenges can ensure most of search engine’s reliability. The one that does not guarantee at least a solid foundation damages the search experience and can even drive customers away to competitors.

Internal search usually indexes the product label and main product characteristics (associated category, type of product, flaps, brand, color and other main technical attributes…). Therefore, the results displayed will be directly linked to whether or not these attributes are accurate.
Let us take a search on “veste en jean” in a French website, a good practice would give this kind of label: “Veste en jean Homme Bleu clair Levi’s T38” because it contains the nature of product with the most requested properties (gender, color, brand, size) that will help customers to get an appropriate answer, to make it easier to compare and to select between the displayed articles. On the contrary, a bad practice would be a wording like this: “Homme Levi’s veste en jean”, less differentiating, lacking useful information and poorly ordered, these articles will surely be less attractive and above all, less easily found by search.

Products are the raw material of the e-merchandiser. This is why product enrichment, which gathers information for each item in the offer, is a pillar in the e-merchandising management. It is essential for adapting a selection of products to visitors’ search/buying journeys.

Relevance and accuracy

The internal search engine needs clean and precise product data. The most important elements are the labels, which must be as close as possible to the customer’s language, and the product’s characteristics.
For instance, if a visitor wants an electric bike, it is irrelevant to name products as electric-assisted bikes when the great majority of customers refer to them as electric bikes. In the same way for cold accumulators, they tend to think of the term “ice pack”. Keeping more technical terms will only push them away from products relevant to their search, and consequently from a possible purchase.

A qualitative product enrichment makes the range more accessible for the final customer, especially by facilitating filtering or bringing up relevant products through internal search.

In some cases, incorrect data may jeopardize the use of a product.
Let us take the search for a motor oil. Each vehicle has a single or several products that are compatible to it. If this compatibility data retrieved in the search results is not entirely accurate, you could purchase an oil that is not compatible with your engine. It can cause a great deal of dissatisfaction that can drive customers away from your brand, not to mention the possibility of breaking the engine and cause an accident in a driving situation.

Another problematic factor when it comes to searching for a product is poor handling of semantics. To overcome this issue with relevance, we help the search engine to understand the customer’s search by creating synonyms, which link the natural language of customers and the technical language of brands.
Synonyms are therefore the first optimization for the relevance of results. E.g.: laptop = portable computer / tv = television / washing machine = washer… A search for “tv” will now retrieve also the results for “television” and without this synonym, a significant part of the range would not be included in the query results.

Standardization

For search engine relevance, it is necessary to ensure accurate product data through standardization of attributes and their values.
Suppose being on search results for televisions and trying to filter them by screen size, using different value formats such as centimetres, metres, inches… not easy!
Or, if you are looking for a city bike, in the event that some but not all of these bikes contain the word “city”, you will probably not get the whole range in the results. This lack of consistency therefore reduces the visibility of part of the city bikes.
In order to obtain and compare the full range of products, you need to have the same basis for attribute values.

This is even more significant when it comes to product labels, they must be clear and standardized, as the place of the word can be a decisive factor, especially for product ranking purposes.
Indeed, in latin languages, the beginning of the queries generally includes the most important part, referring to the type of product desired, before adding more details. On the contrary, for germanic languages as German or Dutch, the first words are generally not the most generic but rather detailed.
To illustrate, we will take the same example in both languages. In french, a towing bike carrier is translated as “porte-vélos d’attelage”, whereas in German, it is “Fahrradheckträger”. If we decompose the searches, in French, we have the nature of the product (“porte-vélos”), then, details (“d’attelage”) while in German, we get the details (“heck”) in the middle of the nature of product (“Fahrrad…träger”).
In other words, incorrect placement of words within the label will mean a drop in the match between search and products, potentially best sellers. Ranking of products will be affected, and given the importance of first positions in the listing page, the impact on business is going to be substantial.

Labels should be thought of with a focus on the customer’s language, not the seller’s. It can eventually limit the need for synonyms but mostly increase the matching between the customer’s searches and the offer of products and services.

Completeness

Customers expect reliable and complete data, which has a major influence on the entire purchasing process, starting with research.

In addition to qualitative data, a good amount of data is also needed, the number of attributes should be sufficient for each product or service.
Consider a house search, on one hand, you are selling your property, a beautiful detached house with 4 bedrooms. On the other hand, several people are looking for a 4-bedroom detached house in your area. It should match, unless in your for-sale ad, you did not specify in the criteria whether it was a detached home and/or had 4 bedrooms, which is why it is so important to include at least all the major characteristics, otherwise product visibility will drop, and therefore the conversion opportunity.

In addition, the customer assumes that they will have the main selection criteria but it is always worth displaying additional characteristics that may be useful to the consumer and will help to evaluate and differentiate the products in listing pages and product pages.
For instance, when we are looking for a laptop, we expect to have at least details of the screen size, the processor model, the capacity of the storage… It is also interesting to display information on weight, connectors, claimed battery autonomy and Wi-Fi and Bluetooth norms, etc.

Conclusion

To sum up, a great quality and quantity of product data allows the internal search engine to retrieve a comprehensive, relevant and accurate answer to queries. It can also help visitors to compare and to choose products, giving them as much information and as consistent as possible, in the same way as a sales assistant in a store.
On the other side, beware of the risk of not having a minimum quantity and quality of product data, as this will make the search engine less relevant and therefore, less useful.
Because ultimately, product data is a major key to the success of the internal search engine, both in terms of experience and performance.
Once you have got this solid foundation, you will be able to start developing personalization, AI and more.

Reviewed by : Laure Hoffmann, Annalisa Sabatino, Johann Didelot

--

--