Using Machine Learning to Improve Upholstery Fabric Discovery

Boosting online sales and engagement

Published in

The Startup

9 min readNov 12, 2018

I had previously cofounded a company named Inside Stores, and one of our niche websites, insidefabric.com, is a top seller of drapery and upholstery fabric.

This article details how we were able to leverage machine learning to boost visitor engagement and sales by greatly improving product discovery for nearly 500K drop-ship fabric patterns.

No Code Here

This is not a programming article and you do not need to be a developer or know anything about machine learning to understand the information presented in this post. My focus will be to describe what problems we faced and how we used ML to implement creative solutions.

I do not include any source code or fancy math; just simple concepts that can be easily understood by anyone running a typical online web store.

Background

InsideFabric.com is a typical e-commerce website which specializes in upholstery fabric. We list nearly 500K patterns for sale from around 50 different vendors.

We don’t stock anything. Fabric typically comes from the factory on long 54-inch-wide bolts (rolls) which are warehoused by our vendors and available for drop shipping. Our customers simply tell us which pattern and how many yards they want, it then gets cut from the bolt, and a few days later is shows up at their door.

Our data is gathered and refreshed daily from a combination of feeds, spreadsheets, emails, PDFs and screen scraping. It’s an arduous process because fabric vendors have only recently started to get on board with the established data automation processes that many of us in tech generally take for granted.

If you’re interested in learning more about our data collection process, I have included screen shots of our custom tooling on my personal website.

Our shopping cart software is a highly-customized version of an off-the-shelf commercial product based on ASP.NET. We purchased the source code many years ago as a starting point once it became clear our needs couldn’t be satisfied by anything available at that time.

Our servers have fast SSD drives, 96GB of RAM and 12-core CPUs. This ensures a high degree of caching and sub-second response times for all search and discovery operations.

Discovery Problem

When you visit a shopping site like Amazon, you generally have a good idea of what you’re looking for. But for fabric, users rarely know much more than that they’re looking for new drapes, or maybe pillows for their sofa. They seldom have any idea about what patterns or colors might work best — they’ll know it when they see it.

Our job, as store operators, is to make it easy and fun for visitors to just start typing, or clicking, and then go down the rabbit hole. Eventually, hopefully, they’ll find something that tingles their brain and we make a sale.

Nearly all of our traffic comes from organic search. Our success is completely dependent upon our SEO skills and having hundreds of thousands of pages in Google.

No matter what users type into Google as a starting point, hopefully one of our pages shows in the results and we’re able to hook them in with our uniquely-fun user experience — because we know that, in the end, what they’ll ultimately purchase is usually completely different from whatever they originally typed into Google to begin shopping.

Dirty Data

On the back end, the first problem we faced was dirty data. With Data coming from over 50 different vendors in a myriad of formats, it was inevitable we’d face some import challenges. The depth of these challenges was a bit surprising at first, but in the end, by solving these problems through creative programming, we created a moat which meaningfully differentiated our web store from most of our competitors.

Descriptions and Meta Data
We knew going in we’d encounter many common problems such as abbreviations, misspellings, synonyms, number formats and missing data. However, the big eye-opener was that, more often than not, vendors didn’t provide any visual descriptions or categories for their products.

Why would they tell us a product was 50% cotton, fire retardant and rated for heavy use, but not tell us it was stripes, or predominately blue and white? The simple answer was that for over a century, you “just looked at it” when browsing in person at the store. If you found a pattern you liked, on the back was a sticker with some of the non-obvious information such as durability, flammability and material composition.

Of course, on a web store, visitors expect to be able to type in search phrases like “blue stripes” or “green flowers”; and rarely do they type in anything related to the meta data directly supplied by the vendors — like durability rating in double rubs (a common metric used by the industry insiders).

It quickly became clear we’d need to use machine learning to augment our data and fill in crucial missing descriptors.

Images
The images provided by our vendors come in many shapes and sizes. Some have borders, some don’t. Some have embedded text captions. The list goes on.

The eventual solution we homed in on was to create an image ingest pipeline that starts with the best available image for a given product and then applies a number of tests and transforms, such as detecting and removing borders, to morph the images into the common formats and sizes used by our website.

Machine Learning

The discovery problems we needed to solve were perfectly-suited to machine learning, and we didn’t even need to get it perfectly right, because for our specific application, fuzzy close on visual searches is actually better than exact matches, in order to show users a variety of choices.

Embeddings

The first step after ingesting and transforming new images was to reduce them to a vector — commonly called an embedding.

Unlike most of the open source examples on GitHub that use TensorFlow to create mathematically-intensive 2048-length floating point vectors, we opted for much speedier 54-byte binary embeddings, based on Compact Composite Descriptors, which would allow us to cache our entire dataset in memory and perform sub-second distance comparisons using Tanimoto calculations for various combinations of colors and patterns.

These embeddings could then be used for classification (stripes vs checkers), text augmentation (append color and pattern groups to product records to support full text searches), and real-time visual searches for matches by color, pattern or both.

We can also search for exact matches if users upload a photo of a pattern they’re trying to find. We just use the same logic to create an embedding for the uploaded image and then search memory for the closest match.

Training Data

For our supervised machine learning to work correctly (for classification), we needed a number of labeled datasets. But, as previously mentioned, most products weren’t tagged with any of their visual attributes.

The solution was to create an in-house tool which displayed groups of products in various categories based on conventional programming, and then use human curators to review the products and click to remove any which didn’t fit the group (which overlaid an X on the image).

In the end, we had beautifully-labeled data for each of the key visual classifications we’d need for improving the user experience on our site.

Classification

Once we had training data for popular groupings like stripes, floral and animal prints, it became easy to take each new image we ingest and see which groups it belongs in (frequently more than one) using common binary classifiers.

This classification step was used to build out the product sets displayed from our website menus and search dialogs, as well as to add text tags and attributes directly to product records to make the products discoverable in text searches (otherwise not possible with the limited descriptions provided by the vendors).

Search Experience

Our search experience combines the usual text search bar at the top of the page along with a full assortment of categorical filters (facets) to help users fine tune their results.

The dialog pictured below is available on every page which displays groups of products. These pages include both standard text search results, and category pages for brands, colors, designers, price points and numerous other predefined groups which were curated with the help of supervised machine learning.

Using the search filters progressively builds up a list of requirements (facets) at the top of the screen which can be further tuned with a single click to change the product results in real time.

The goal was to make it so that no matter how you landed on a page of products, it was easy to take that as a new starting point for engagement and apply any number of new filters. Perhaps you loved those stripes, but now you need to see them in cotton, from Ralph Lauren, for under $100 per yard — easy!

Fabric Suggestions

On every product page we used machine learning to create a matrix of similar patterns based on key visual attributes of the currently-displayed product.

Maybe you liked the stripes, but not the color. Or, you liked one of the colors, and you want to see it in checkers. Or, maybe these stripes are too wide and you want something similar, but thinner, or at a different price point.

We covered all the angles. Just click.

The key point was to make it easy and fun to keep clicking down some random path of somewhat-related products; offering interesting twists and turns along the way.

Product Galleries

The final component we added to improve engagement and discoverability was to randomly add eight product gallery listings to the bottom of nearly every page of the website.

The thinking was that since most visitors really didn’t know what they wanted and were just clicking around until they saw something that tingled, it was just as important to show them patterns outside of their current search path as it was to show them products which closely matched their search.

We programmatically created well over 200K of these “long tail” galleries which also had the side benefit of adding many thousands of pages to our Google footprint — for improved SEO.

Result

The cumulative effect of all of the changes described in this post was that engagement went through the roof. People just kept clicking. It was fun.

Visitors could either keep going down some path, or suddenly see something totally different and jump over to a new rabbit hole. And they’d repeat this process over and over again until eventually they came across the perfect pattern for their new chair, or pillow, or drapes.

The average time spent on our website per visit more than tripled, and so did sales. It was a superb application for machine learning and a lot of fun for my first real venture into this space.