How Machines Learn Their Stripes

AI that knows its animal print from its camouflage, its paisley from its plaid.

Joe Berry
Inside EDITED
5 min readNov 7, 2017

--

At EDITED we can’t sit still for long. We have millions of apparel products arriving into our app everyday.

So we’re always working on new tools to help our customers (retailers and brands) filter through that wealth of data, so they can drill down on the stuff that really matters to them.

Most recently we released a machine learning model to understand and classify patterns in retail.

Patterns are a key way that our users — merchandisers, suppliers, and designers — stay on top of what’s on trend in the fashion industry.

From a human perspective, patterns are easily identifiable from an image, but for a computer it’s much more challenging. A computer must first learn what we mean by a pattern and then understand the thousands of nuanced details that differentiate between them.

And they have to understand those nuances not only visually, but in the way they’re described too. That’s why we’re super excited to have released a state-of-the-art model that has conquered doing exactly that!

Computers have to not only learn what we mean by a pattern, but understand the thousands of nuanced details that differentiate between them.

Here’s how we did it, combining deep learning with natural language processing to build a highly performant model that’s capable of distinguishing between your Aztecs and your animals.

Putting the 👁️ in AI

You’ve probably seen those magical AI-fuelled apps that, from a picture, recognize objects, reveal your age, find your celebrity doppelgänger or any other number of life-changing revelations.

They all use a deep neural network model called a convolutional neural network, which is the cool kid in town for image recognition tasks right now.

Examples of how neural networks learn to identify images. Credits to Tensor Flow.

In a nutshell, what these models do is find — through the use of small filters applied to the image (an operation known as convolution) — fundamental characteristics of an image.

These findings are done on different levels. For instance, first the machine will try to find the important edges — called low level features — and understand the interactions of these edges and how they form shapes.

These edges interact with each other in a network. The machine will then go on to understand how these shapes interact to form more complex structures and so on, into what we call a higher level features.

Each one of these stages are called layers, and the connected layers of filters in a network is a tentative way of mimicking how our brain works.

So, if you notice the bolded words, this is exactly why they are called Deep Convolutional Neural Networks.

Credits to Mathworks.

The high level features can be represented as what we call a feature vector, a chain of about 2,000 numbers that completely represents the content of the image. Like this:

Having these high level features, we can classify them into a fixed number of categories:

We did that with about 35,000 images of 15 different patterns, so the neural network was able to identify them.

In machine learning, convolutional neural networks are all the rage. Alongside other deep learning techniques they are super powerful when it comes to interpreting imagery.

However they are limited by the quality of the image, just as humans would be. Is it clear? Does it focus on the item we are interested in? Below are some examples of hard to classify images:

What to focus on? Tops? Bottoms? Handbags? Lots of different patterns and products make it difficult to make an accurate prediction

Adding Natural Language Processing to the mix

For this reason we decided to utilize another well known machine learning technique known as Natural Language Processing. This is where we use computational models — some simple, others highly complex — to learn and understand text.

This allowed us to train a model that looked at the name, description and other text-based features of a product in order to predict a pattern.

On its own this model would not be very performant, but our goal was to combine this with the convolutional image model. That would enable it to utilize the power of both image and text to make an educated prediction for what type of pattern a product might be.

Simplified example of how our patterns model makes a prediction using both NLP and CNNs. Text is converted to numerical representations by a learned method of embedding, which aims to preserve the semantic nature of the text, a classification model can then be used to make predictions based on these embeddings and the output of the image based CNN

Training a model that understands both image and text in conjunction means we came up with a much more powerful model that accurately predicts patterns across our 150 million products.

This ensemble approach meant we can rely on the startling power of convolutional image classification while reducing the error rate from low quality images by considering the text used to describe products.

A trained model that understands both image and text resulted in accurate pattern prediction across 150 million products.

This wasn’t just about testing the capabilities of technology. Innovation for us means breaking ground so that our customers can get the best possible insights from data.

In creating this model, EDITED now lets retailers analyze patterns with a sophistication and accuracy previously unchartered. They can understand exactly when floral prints should arrive in store to get the longest full price run, or how many striped shirts to order versus polka dotted.

This kind of technology means retailers stock into the right trends, at the right time. And it helps them avoid costly mistakes.

And what’s even more exciting is the more it learns from our customers, the better it gets. Which is just like us!

Joe Berry is a data scientist at EDITED specialising in statistical modelling and natural language processing. Aside from taming data — and being really tall — Joe enjoys making pretty pictures and shooting hoops. Wanna join the all-star team? Come work with us.

Paulo Sampaio is also a data scientist at EDITED and expert in image-based machine learning. He’s also in a band, which makes him very cool. He might even play you the guitar.

--

--

Joe Berry
Inside EDITED

London based Data Scientist. Applying Machine Learning methods in fashion, but not very fashionable.