Leveraging AWS Rekognition to Analyze Website Images: A Real-World Example of Spotting Broken or Missing Images

Boris Selivanov
SSENSE-TECH

--

Broken or missing images on a website can significantly degrade the user experience, leading to a loss of user trust and potential sales. However, traditional manual methods of identifying these images are time-consuming. Utilizing a combination of web scraping tools and AWS Rekognition provides an efficient, automated solution. In this blog post, we will explore a real-world project that harnesses the power of those tools to systematically detect and address broken images on a website. We will also look at the importance of maintaining intact images on websites, focusing on our experience with ssense.com.

Why Images Matter?

“The details are not the details. They make the design.” — Charles Eames

In the vast world of web development and design, it’s sometimes easy to overlook what might seem like a minor hiccup.

When running an e-commerce, a broken image could translate into a missed sale. So, automating the image detection process isn’t just convenient — it’s imperative.

Let’s visualize this scenario for a moment. Imagine stepping into a store, perhaps a sophisticated boutique. Your welcome consists of vacant hangers, price tags lacking figures, or pairs of shoes that are nowhere to be found. Frustrating, isn’t it? Chances are, you wouldn’t consider visiting again. Transpose this encounter to the digital realm, and you’ll understand why a website with broken or missing images is similar to that in-store scenario.

Here’s why it’s crucial:

  • First Impressions Matter: The online realm is teeming with countless websites. If a visitor lands on your site and sees broken images, they’re getting an immediate negative impression. And as they say, you only get one chance to make a good first impression!
  • Building Trust and Credibility: Websites are the embodiment of a brand’s online identity. When users encounter missing images, it can erode their trust. They might question, “If they’re unable to oversee their images, can I trust them with my data or my money?”
  • SEO Impacts: Search engines strive to deliver the best experience for their users. A site with broken images can hurt its search rankings by signaling subpar site health. Diminished organic visitors mean less traffic, and in turn diminished sales and engagement.
  • Lost Opportunities: In the competitive landscape of e-commerce — especially in the luxury retail sector where SSENSE operates — visuals aren’t just persuasive; they are critical. High-value transactions hinge significantly on the alluring representation of products through images. If a product image fails to load, not only does it undermine the exclusivity that a brand like SSENSE portrays, but it could very well result in missed sales opportunities. After all, purchasing luxury items is as much about the emotional connection as it is about the physical product, and that connection is often forged through compelling visuals. Who would invest in an item they can’t see?
  • User Experience: The goal is always to create a seamless user experience; this is a universal aspiration for web designers. Yet, encountering a broken image disrupts this harmony — it’s like hitting a sudden roadblock in a video game. It interrupts the rhythm, diverts the user’s attention, and might even push them to click that dreaded back button.

Tackling broken images goes far beyond resolving a superficial issue. It’s about refining the user experience, boosting credibility, and ensuring that your website delivers on its promise. It’s these things, consistently addressed, that set great products apart from good ones.

SSENSE Two Step Strategy

Have you ever come across a default or generic image on a website where the actual product photo didn’t load? That’s a placeholder. As an example, take a look at the placeholder image below on one of our sold-out product pages–it’s mostly white.

Now, let’s talk about a project we undertook on ssense.com.

Goal: Rapidly identify and rectify broken images on product pages.

How? In two simple steps.

Step 1: Extracting Images Through Web Scraping

To unleash the power of Rekognition, images are essential. However, we’re developers, and manual work isn’t in our repertoire. So, let’s automate!

Using the powers of Node.js combined with axios (for fetching web pages) and cheerio (for parsing them), we compiled a comprehensive list of image URLs.

Let’s dissect each component of the code.

Simple and neat, right?

With this setup, whenever the server is initiated, it will automatically scrape the image URLs from the given website. If you plan to use this regularly, remember to respect the website’s robots.txt file and usage policy. It’s crucial to note that scraping can consume a significant amount of resources and has the potential to lead to a ban if exploited.

Step 2: Harnessing AWS Rekognition

This comprehensive managed service by Amazon Web Services (AWS) empowers developers to integrate image and video analysis to applications. Leveraging deep learning technology, Rekognition can identify objects, people, text, scenes, activities, and even detect potentially unsafe content.

DetectLabels is an API within Rekognition that analyzes images to identify various properties. One key feature, the IMAGE_PROPERTIES parameter, provides insights into an image’s quality, sharpness, brightness, and contrast. In particular, it can find the dominant colors present in the image, offering invaluable insights about the image’s main colors.

Why it’s Perfect for Spotting Broken or Placeholder Images

  1. Deep Learning Technology: At the heart of AWS Rekognition lies deep learning technology, which is exceptionally adept at recognizing intricate patterns. Given that broken or placeholder images often have consistent traits — such as large white spaces — this enables Rekognition to excel in their identification.
  2. Scalability: Managing websites with extensive visual assets — like e-commerce platforms — often involves analyzing thousands of images. Rekognition can handle this scale effortlessly, making the detection process both efficient and accurate.
  3. Integration With the AWS Ecosystem: If you’re already utilizing AWS services, the integration of Rekognition is seamless. This means less setup time and more efficiency. The synergy between AWS services can lead to even more advanced solutions, such as alerting web administrators in real-time using AWS Lambda and SNS when a broken image is detected.
  4. Pay-as-you-go Pricing: Instead of a hefty upfront cost, AWS Rekognition follows a pay-as-you-go pricing. You only incur charges for the images and videos you analyze and the face metadata you store. This can be particularly cost-effective for businesses that don’t have a consistent flow of images to analyze.

How Does AWS Rekognition Work for Our Specific Issue?

  1. Image Input: Rekognition has the capability to analyze an image, which can be sourced from either an Amazon S3 bucket or directly provided as bytes.
  2. Detecting Dominant Colors: One of the key features of Rekognition is its ability to detect dominant color schemes within images. For our problem, this is crucial. By determining the dominant color palette of an image and finding that a significant portion is white, we can deduce that the image might be broken or a placeholder.
  3. Threshold Setting: Setting a strategic threshold for detection. Not all images with large white spaces are necessarily broken or placeholders. This is where setting a judicious threshold comes in handy. In our specific scenario, we flag an image if more than 95% of it is white. However, it is worth noting that this logic is case-specific. Depending on the unique requirements of your project, you might establish different criteria. For instance, you might decide to integrate logic that verifies whether the product image aligns with the product category, among other validations. Essentially, the threshold and the underlying logic can be customized to suit the specific needs and acceptable false-positive rates of different use cases, offering a versatile solution to various challenges.
  4. Output: Once AWS Rekognition analyzes an image, it returns a set of attributes. In our scenario, these attributes pertain to the colors within the image. Should the predominant color be white and surpass our established threshold, we have the option to mark it for subsequent examination.

Now for the fun part! Using our list of image URLs, we can let AWS Rekognition do its thing. The fundamental idea is to assess these images for significant white spaces, which can be a telltale sign of broken or placeholder images.

The below script illustrates how to harness AWS Rekognition for our specific needs.

Let’s look at the complete code:

Console Output:

Below is the screenshot of the console output after running the script.

Through the script detailed in this guide, we completed the first crucial step of aggregating all image URLs from a specific webpage into a list. This forms the foundation of our subsequent analysis, aiming to pinpoint mostly white images, which often indicate placeholders or missing visuals. Our target is to identify images where over 95% of the content is white. By leveraging AWS Rekognition’s deep learning algorithms, each image will be meticulously analyzed with a focus on large white spaces. Images surpassing our set 95% white threshold are flagged as potentially broken or placeholders.

While we kept it straightforward in this article, it is possible to take this initiative a step further by gathering the final results into a JSON file. This approach can facilitate a systematic record-keeping of potentially broken images, serving as a valuable resource for website maintenance.

Conclusion

Broken images impact more than the aesthetics of a website. When images break or go missing, the user experience and the website’s credibility are jeopardized. By skillfully merging web scraping techniques with AWS Rekognition’s capabilities, this project proposes a solution for a widespread digital challenge. With automated checks like these, web developers and site content managers can ensure a seamless visual experience for their visitors, enhancing overall site performance and credibility.

Editorial reviews by Catherine Heim, Gregory Belhumeur and Mario Bittencourt

Want to work with us? Click here to see all open positions at SSENSE!

--

--

SSENSE-TECH
SSENSE-TECH

Published in SSENSE-TECH

Ideas and research from the software, data & product teams behind the global fashion platform SSENSE.