From Pixels to Purchases: Exploring the Impact of Visual Search in E-Commerce

Yurii Laba
Intelliarts AI
Published in
11 min readJun 30, 2023

Buyers often choose goods and products based on appearance, especially in the e-commerce industry. AI-based visual search and ecommerce blended relatively recently. Yet, companies have already started to exploit this field of AI technology.

The Intelliarts AI team of engineers participated in the visual product recognition challenge. We acquired useful insights while building a high-performing ML model for suggesting similar goods and products based on their images. And we are willing to share these insights with you throughout the article, so follow us here.

In this post, you’ll deepen into some technical aspects of building visual search for ecommerce. Besides, as a C-level executive, you may also be interested in the concrete business advantages of investing in such technology as well as in reviewing examples of strong market players and their experience with visual search.

Visual search in e-сommerce: brief overview

Let’s start exploring how to use visual search in ecommerce. The first step is getting an insight into the technology and its modern-day market.

From a technical perspective, visual search is a functionality enabled by a combination of AI-driven algorithms. The resulting solution can be composed of the two following technologies:

  • Object detection. It enables to locate objects of interest within a digital image and identify individual products based on visual attributes, i.e., color, texture, pattern, shape, etc.
  • Visual search engine. This technology utilizes ML algorithms to match identified products with ones in a database and suggest related products based on detected visual similarities.

From an e-commerce business perspective, visual search technology is all about the user experience. It helps simplify the process of searching for a particular product, which increases the chances of a purchase. At the same time, visual search offers a form of entertainment as an extra benefit.

Visual search e-сommerce market size

Now, let’s attempt to predict how visual search engines will impact e-commerce in the future. Estimated to surpass $14 billion by the end of 2023, the global visual search market will rise up to as much as $33 billion by 2028, according to historical data and forecasts. For the forecast period of 2021–2028, the Compound Annual Growth Rate (CAGR) of the market is predicted to be 17.5% respectively.

It brings to the conclusion that the interest to this relatively new technology is at a high level, and e-commerce businesses worldwide are likely to increasingly invest in visual search. Potentially, within the next decade, most e-commerce businesses will adopt visual search as part of their platforms’ functionality.

The benefits of visual search in e-сommerce

There is a range of main business benefits related to adopting visual search technology. Examples include the following:

Improved search accuracy and efficiency

Search engines, including semantic ones, do a good job of helping customers find the products they are looking for. Yet, visual search technology raises the efficiency of the processes to a whole new level. With visual search, a customer can take a picture of the item they desire, and the ML-driven engine will identify matching or similar products without the need for an exact name or description of the item. Easier search for customers means a higher probability of a customer going for a purchase.

Reduce bounce rates

Higher bounce rates indicate that customers are likely uninterested in browsing the site for any reason or face difficulties making them abandon the site and go to another one. Since visual search as an additional e-commerce platform’s functionality can provide customers with relevant, accurate, and personalized search results, users are more likely to find products they are interested in. This way, customers are more engaged in browsing the platform.

Enhanced online shopping experience

Online shopping experience is all about finding the necessary products via the online platform. Visual search engines enable customers to browse a wide range of products based on the picture of the item they desire. It saves time and effort, once again making a customer more likely to make a purchase from you.

Check on the below video to find out how video search functionality may work:

Enhanced offline shopping experience

Fashion e-commerce and visual search are tightly connected even from the perspective of running an offline business, such as a clothing store. Customers can take pictures of items they see in an exhibition hall and then find them and similar products via the store’s online platform. It’s especially useful when the desired item is absent on-site in needed sizes. Alternatively, visual search engines may help customers find a range of products they want to ask consultants about or try on if available on-site.

The creators of the challenge strongly believe that image-based search can be particularly handy when a customer observes the desired product in real life, movies, or media.

Product discovery

Since visual search technology not only finds exact matches but also suggests similar items, it plays a crucial role in product discovery. It means that it can engage customers in considering additional items from the same or similar categories compared to the initial interest. Helping customers discover items that may not have been found overwise leads to additional sales for businesses.

Increased sales and revenue

It can be stated that visual search in e-commerce increases revenue and sales. Customers are more likely to find and purchase things they are interested in when they find desired products with ease and simultaneously receive customized search suggestions. Furthermore, visual search and the accurate results it provides help make more informed purchasing decisions leading to reduced returns rates.

Click here to download the PDF one-pager about Visual Search for E-commerce

Metrics for tracking visual search success in e-сommerce

The effect of incorporating visual search for retail industry players can be measured and assessed.

Business metrics

Now, let’s proceed with reviewing common business metrics typically utilized to evaluate the impact of integrating new technology.

  • Conversion rate. This metric measures the percentage of visitors who made a desired action, such as making a purchase or filling out a form. The conversion rate is calculated as the number of conversions divided by the number of website visitors and multiplied by 100%.
  • Click-through rate (CTR). It’s a percentage of people who interacted with an ad and visited a website or landing page. CTR is calculated as the number of clicks divided by the number of ad impressions and multiplied by 100%.
  • Basket size. This refers to the average number of items in a shopping cart of a customer. This measure indicates how well the functionality of a platform performs in offering complementary products.
  • Bounce and exit rate. It’s the percentage of website visitors who left the website without taking any action. Bound and exit rate is calculated as the number of visitors who left the website divided by the total number of visitors and multiplied by 100%.

Visual search technology, when integrated into the website as an additional functionality, can have both direct and indirect impacts on the performance of the platform. This influence can be tracked by comparing the historical data for performance evaluated with the use of metrics to the performance after the integration of the visual search.

Technical metrics

Aside from business metrics of the success of visual search engines in an e-commerce project, there are technical ones. They help assess the performance of individually trained models for solving visual search tasks. Examples include:

  • Accuracy. It’s the percentage of correct predictions. Accuracy is calculated as the number of true positive and true negative predictions divided by the total number of predictions and multiplied by 100%.
  • Precision. It’s the estimation of how reliable a prediction for a particular class, i.e., prediction output, is. Precision for class 1 is calculated as the number of true positive predictions divided by the number of positive predictions and multiplied by 100%. Precision for class 2 is calculated in the same way but with negative values.
  • Recall. It’s the estimation of how well the model can predict a class. Recall for class 1 is calculated as the number of true positive predictions divided by the number of true positive and false negative predictions and multiplied by 100%. Recall for class 2 is calculated in the same way but with true negative and false positive values.
  • Mean average precision (mAP). This metric is the evaluation of a model’s general precision. MAP is calculated as the sum of average precision values for each class divided by the number of classes.

In the visual search competition, the Intelliarts team of engineers used exactly this metric. It was so because our model was intended to find the 1000 most similar products based on a digital reference. Unlike the other three metrics, mAP suited the evaluation of such search results perfectly.

The representation of what’s understood under predicted and actual, as well as positive and negative values in the above formulas, is known as Confusion Matrix. It’s shown in the image below.

Real examples of visual search for e-commerce

Well-established businesses that operate in e-commerce have already adopted visual search functionality. Moreover, they have been using it successfully for years. Among such are the following companies:

  • ASOS

This British online fashion and cosmetic retailer has been operating in the e-commerce industry since 2000. It offers a “Style Match” feature, which is exactly an implementation of the visual search technology on the ASOS platform.

Style Match functionality offers to take a picture of an object of interest using the phone’s camera or upload a picture from the photo library to run the search process. Then, the ML algorithms of the visual search engine and the IT infrastructure behind it will do the rest. ASOS claims that they don’t store the image, they just analyze the information from the visual, i.e., color, patterns and type of clothing, etc., to find a match and make personalized recommendations to customers.

  • Alibaba

Alibaba is a Chinese multinational company specializing in e-commerce. It’s regarded to be one of the fastest-growing retail platforms in the world. Alibaba connected manufacturers and wholesalers with retailers and individual customers, and the range of products traded via its website is vast.

Alibaba offers the “Image Search” functionality, which provides the same conventional capabilities to find product items based on uploaded digital imagery. As a complementary feature, the Image Search engine allows uploading a link to an image to run the search process effortlessly.

  • Amazon

Amazon is an American multinational technology company. Its mission is to be Earth’s most customer-centric company. Amazon exploits innovative technologies in its IT environment extensively. As such, 1-click shopping, personalized recommendations, and other industry-latest technology advancements.

In 2019, Amazon introduced the visual search functionality packed into the feature called “StyleSnap,” available through the Alexa app. As usual, the technology enables users to get the most relevant products to their search by simply uploading a photo and letting artificial intelligence perform the search.

  • BooHoo

Boohoo is a UK -based online fashion retailer founded in 2006. It became a popular online store offering affordable and trendy clothing and having a presence in over 100 countries. BooHoo is considered to be one of the largest fast-fashion retailers in the world.

The visual search functionality that the BooHoo platform offers to its users is “Camera Search.” It was introduced back in 2017 with the goal of enhancing customer experience and, potentially, increasing the revenue from the business operation. The technology supports searching based on screenshots or pictures taken in real-time with the help of a phone’s camera.

The listed companies are well-established businesses that made a fortune by pioneering innovations. It’s safe to say that integration of the visual search technology in their platform played an essential role in their success.

Technical visual search requirements and principles

A visual search functionality is a complex solution. Discover what components required for its creation are and what they serve for from the infographics below.

In the visual search competition, the creation of the ML solution followed exactly the detailed pattern. We had the database of product images separated into user and seller photos. The latter was our ground truth. At the same time, user photos, which were typically snapshots of products taken with a phone camera in cluttered scenes, were used for testing the ML model.

The role of AI and machine learning in e-commerce visual search

Now, let’s review some of the best practices for improving the publication of images via an e-commerce platform and optimizing a visual search solution:

  • Alt text. It’s important to include brief information about a product shown in an image. It helps search engines understand the context of an image which results in a higher likelihood of finding the necessary product.
  • Numerous images. Include multiple images of the same product. It doesn’t only allow customers to have a better overview of a product but also increases the chances of a search engine finding the item.
  • Keyword usage. Descriptions should contain several long-tail keywords to help the image appear in search results.
  • Product detection first. It’s recommended to have product detection as the first step of image processing with a visual search solution. It ensures that no background elements will be treated as objects of interest, which might interfere with the accurate matching.
  • Visual transformers (ViTs). It’s advised to ground a visual search solution on the ViT models family and utilize a Contrastive Language-Image Pretraining (CLIP) network. The latter can predict an accurate text snippet based on an image. Usage of the technology can give a huge boost to the accuracy and speed of search for a resulting solution.

The team of ML engineers of Intelliarts AI could see the high usefulness of these practices throughout the course of our projects.

Our experience with visual search

While working on a visual search for retail, we tried multiple approaches as a part of the AI competition. We started with Convolutional Neural Network (CNN) architectures as classical neural networks to build an ML solution on. Yet, we came to two important conclusions in the process:

  • Insight 1. ViT models family shows huge outperformance over classical CNN architectures.
  • Insight 2. It’s crucial to use re-ranking techniques when building visual search solutions. Evaluating search outcomes and moving more similar objects higher in the list of results adds to the usefulness of the solution overall.

You may give a try to an online demo that presents an interactive playground for the trained AI model. The demo shows the capabilities of a visual search engine to suggest goods and products available on the web based on an input image or video frame.

Final take

Visual search is a trending technology in the e-commerce niche. Such large market players as ASOS, Amazon, Alibaba, and BooHoo already offer visual search capabilities to the users of their platforms. The enhanced shopping experience, lower bounce rates, and higher revenue from business operations are only a few of the advantages the technology may offer.

--

--

Yurii Laba
Intelliarts AI

Machine learning engineer at Intelliarts. Highly interested in the anomaly detection field.