An Introduction to Hierarchical Recurrent Neural Networks Applied to the Fashion Industry

Michael Triska

Published in

AMARO

7 min readApr 1, 2020

Written by Michael Triska — Machine Learning Architect at AMARO

Session-Based Fashion Item Recommendation with AWS Personalize — Part 1

Common item recommendation systems rely on item-to-item similarities approaches in the form of “people who bought this, also bought”. Such methods are proven to be effective, however, providing personalized fashion recommendations poses unique challenges. The key problem with commonly used collaborative filtering methods applied for fashion recommendation is that they only consider a user’s last click rather than long term preferences (in other words, their “style”). Another serious disadvantage is that it might lead to poor customer experience, e.g. when recommending products that don’t fit the users’ body shape or color taste. On top of that, managing the full life cycle of a machine learning product requires expertise in different areas of science and engineering.

This blog post will explain our vision for a personalized recommendation system and the building of an intuition for the applied solution with a recurrent neural networks (RNN) algorithm from AWS Personalize. We aim to share and teach you in this series of blog posts our learnings, experiences, pitfalls and results in building a recommendation system architecture based on user interactions with AWS Personalize. This series of blog posts are structured as follows:

Part 1 — Introduction, Challenges and the beauty of Session-Based Hierarchical Recurrent Networks 📍
Part 2 — Technical Implementations and Pitfalls
Part 3 — Creating a User Interaction Dataset
Part 4 — Experiment Results, Evaluation, and Discussion

Purpose

Everybody needs clothing, and each piece is chosen personally. Our fashion style reflects the focus on how we want to be seen and what we want to emphasize. Fashion has the power to remind us that our fascination with our thoughts, the opinions we hold, the definitions we give ourselves and the bodies we live in, are just temporary. Fashion is asking us to follow our own unique path, to attune to our inner creativity, to listen to our inner knowing and step out of other people’s expectations. As RuPaul, an American drag queen, says:

“We’re all born naked and the rest is drag; meaning everything you put on after you get out of the house is, in essence, a construct. It is something that was built. My glamazon drag is obviously drag, because it is clearly a man wearing traditional feminine, glamorous clothing. For the everyday person, they are in drag too, but it is not as obvious. It’s a way of putting the focus on the vision you put forth.”

At AMARO, we put our customers at the heart of everything we do. Each customer is unique; in style, taste, their personal situation as well as body shape. Using these ideals as the foundation for the project, we aim to build a fashion item recommendation system with a high and genuine interest in personalities that takes our passion, professional knowledge, and state-of-the-art technology to recognize and reveal our customers’ individuality, and what makes them unique and special.

AWS Personalize Recipes

AWS Personalize is a fully-managed service with that you can develop and deploy personalized recommendation systems within weeks. AWS provides predefined machine learning algorithms for model training, called a recipe; and takes user interaction data to create a generalized solution. Here, we will focus on the hierarchical recurrent neural network HRNN recipe, which models a simple user-item dataset containing only user id, item id, and timestamps. AWS also offers a recipe to add user metadata like location, age, etc or item metadata to the HRRN, which sounds intuitive but comes with the risk of overfitting.

© Monty Python’s Flying Circus 1969–1974.

The HRNN recipe uses the proposed way to model RNN as described in this research paper. Different to collaborative filtering methods which in general ignore the information of past sessions or clickstreams, the authors try to answer the question:

“How can we combine long-term (historical) and short term (session) intent of a user?”

Intuition

RNNs (see image 1) are designed to handle any kind of sequential data.

**Image 1:** Representation of an RNN. https://bit.ly/2JsjJp7

Language is a great example to explain RNNs intuition. Imagine you want to train a character-based RNN model to create a word completer for your search bar. In this example (see image 2), we train on the word “hello”. So, the RNN predicts the next character h(N) given an input character X(N) as a one-hot representation of the current characters occurring in the dataset (here “h”, “e”, “l” and “o”).

**Image 2:** Character level training for “hello” with an RNN. https://bit.ly/2JsjJp7

In the first step, we input “h” at X(0), which should output “e” as h(0) as the RNN calculates the likelihood for all characters being the next occurring character.
In a second step, we take those probability scores and the new input character X(1) “e” for the next state. Imagine here that the model predicts a higher probability for “o” being the next character. This wrong prediction then will be used to calculate the cost and update weights and biases.

The Short-Term Memory Problem

Language generation or understanding entails not only the ability to produce words but also the ability to understand the relationships between them. Simple RNNs normally suffer from a short-term memory problem (Nguyen 2018):

“If a sequence is long enough, they’ll have a hard time carrying information from earlier time steps to later ones. So if you are trying to process a paragraph of text to do predictions, RNN’s may leave out important information from the beginning.”

To address the short-term memory problem, LSTM ’s and GRU’s cells were created which have internal mechanisms that can regulate the flow of information by throwing away what is unnecessary and adding new information when needed. Given the following sentence:

“AMARO is here for the change that consumers demand today: it takes a more honest, accessible and clever approach to customer experience. We are designed to fit new generations by serving them with high-quality sustainable products at accessible prices with unprecedented convenience.”

An LSTM or GRU cell would learn, that in the first sentence “AMARO” and “it” require to form sentences in singular person outputting “is” and “takes”. However, in the second sentence as it now talks about “we”, the cell would drop the state of the singular person and add the plural person to its current state.

Hierarchical Recurrent Neural Networks — The models’ architecture

A user’s interaction history is a sequence of events like clicks and impressions on the website within a given timeframe organized into sessions. Click events in a product recommendation data set consist of user interactions event types that indicate a users’ interest in a product like a zoom event into a picture or a click on product details.

HRNNs are RNN models designed to model hierarchical structures in sequential data. In the context of session-based recommendation, these hierarchies constitute a user interaction at the lowest level combining to form a session-level representation, where session representations model information to form user-level representations. The network employs GRU cells to learn the temporal correlations between sessions.

Image 3: Graphical representation of the proposed Hierarchical RNN model.

The initial input of the GRU network is the first item ID a user clicked on. The output is a likelihood score for every item appearing in the catalog of being the next item in the clickstream. In the next steps, we have two sources of input, the new item id and the former predicted likelihood scores, which are getting compared to check the difference between estimated and true values.

After a user-session ends, the key idea lies in updating the weights of the user-level representation GRU cell. This information is then used to initialize a new session representation and to reinforce each cell state with the user’s preferences.

This implies that we capture users’ evolution in taste, but also that long term preference is “robust” to small temporary changes like buying a present for a friend or buying a winter jacket. This also accords with the need of providing each user with personalized and unique recommendations as each session will output different recommendations based on past user behavior.

Conclusion

Session-based approaches that follow an RNN architecture are found to outperform traditional collaborative filtering approaches.

In this blog post, we set out to establish an intuition for HRNNs used as a machine learning solution of AWS Personalize. Stay tuned to see how HRRN models prove capable in a real-world fashion example of recognizing individual style preferences.

For those who are further interested in LSTM and GRU I can recommend this blog post; for a super intuitive introduction to neural networks, in general, this video.