Product Review Tagging at THG

May 14 · 11 min read

Section 1 Background Story

Product tag extraction is already becoming a trend in the world of e-commerce. The world’s largest e-commerce companies like Amazon and Taobao already have this feature in place.

Review Tags on Amazon
Review Tags on Taobao

Section 2 Literature Review

Section 2.1 Frequency-based Method

Section 2.2 Graph-based Method

Section 2.3 Neural Topic model

Section 3 Methodology Overview

Process of the product review tagging
Live demo of tagging system

Section 4 Detailed Implementation

Section 4.1 Candidate Selection

An example of the dependency parsing tree

For example, when the review “I use wrist wrap for gym exercise.” is fed to the dependency parser, the following tree-structured relation is obtained. We then use defined rules to find the valid bigrams “use wrist”, “wrist wrap”, “gym exercise”, which are extracted as candidates. For unigrams, we only pick up a noun, verb, and adj as candidates. Along with the iteration of all reviews, the candidate position within the sentence is tracked and the number of occurrences is also counted. The output of this phrase would be candidates with a number of occurrences and positions of all candidates in all reviews.

Section 4.2 Candidate Ranking in 5 dimensions

1 — Informativeness

Informativeness means how representative these tags are, to this product and to this group of people. TFIDF is a straightforward metric to measure informativeness since it reflects the number of customers that are interested (Term Frequency) as well as the importance of the term to this certain product (Inverse Document Frequency). However, this approach chooses to view each word separately without considering their relations and also the performance of simple IDF shrinks if the corpus is across a wide range of domains. Instead, we propose a new Informativeness ranking strategy called dependency relation based TextRank Domain Relevance Score(dr-TRDR), which will be covered in the following section.

2 — Phrase-ness

Phrase-ness is used for measuring how likely a bigram is a valid phrase. For example “make feel” might have a higher informativeness score since people frequently write something like “this vitamin makes me feel good”. But it is not a good standalone tag for people to understand. We measure phrase-ness by co-occurrence and PPMI calculated on a large corpus. If co-occurrence and PPMI of bigrams are under some threshold, the tag gets zero scores for phrase-ness. Also, the notion of phrase-ness is considered into our dr-TRDR algorithm.

3 — Semantic-ness:

If a tag is an important tag, it is likely other tags with similar meanings are important tags as well. Under this assumption, we measure the semantic share of different tags into our dr-TRDR algorithm.

4 — Diversity:

Ideally, we want our displayed tags to cover a wide range of topics rather than clusters in a certain aspect. A classical approach called Maximal Marginal Relevance (MMR) is applied to the tag selection stage, which will be discussed later.

5 — Coverage:

Ideally, we want our displayed tags to cover a wide range of reviews rather than cluster in a small proportion of reviews. Since these tags are built for users to explore other customers’ opinions, the impact of the tags will shrink if they only cover 10% of all the reviews. It is considered together with diversity in the tag selection stage.

Overall, in this stage, each tag is ranked by dr-TRDR with the following formula.

Section 4.3 Dependency relation based on TextRank (dr-TR)

There are dozens of variations of this algorithm in terms of how to define the vertices and how to measure the edge weights, but most of them follow the same structure. Here we propose a new type of derivation called dependency relation based TextRank (dr-TR). The words with the same lemma are modeled as the vertices in the graph. There is an edge between two vertices if they are linked by a couple of dp-relations that are predefined in the candidate selection phrase. In terms of edge weight, the classical TextRank algorithm mainly uses the number of co-occurrences. We add phrase-ness and semantic-ness into consideration since when a word “votes” to its neighbour, it should vote more to the neighbours that are more similar to itself or form a phrase together. The formula is defined as the following:

The formula of dp-TR

freq(.) means how often it occurs in all reviews of a product and PPMI is calculated based on an external corpus. Attraction score defines how the words attract each other by simulating the formula of gravity. We iterate through all reviews and set up those vertices, edges, and weights. An illustrative example is shown below. Once the graph is constructed, a classical PageRank algorithm is run for 1000 iterations, the final score of each vertex is called Text Rank score.

More formally, the iteration process is defined as the following. where d is the damping factor(usually 0.85).

The formula of WordRank iteration

Section 4.4 Domain Relevance Score (DR)

The formula of Relative domain relevance

w_tj is a TF-IDF-like weight of candidate t in document j. N means the document number in domain D. R(t, D) includes two measures to reflect the salient of a candidate in D. The first part reflects how frequently a term is mentioned in a particular document. W_j denotes the word number in document j. The second part quantifies how significantly a term is mentioned across all documents in D.

All of the ranking calculation above is based on unigrams. For bigrams, we average the score of the unigrams. This might cause a problem in which a very important word will have too much influence on the final result. For instance, if “vitamin” has a very high score, all tags of vitamins like “good vitamin”, “vitamin supplement”, “vitamin taken” will take all the top positions. In the tag selection set, we try to avoid semantic overlap between selected tags.

Section 4.5 Tag Selection

The original MMR from information retrieval and text summarisation is based on the set of all initially retrieved documents, R, for a given input query Q, and on an initially empty set S representing documents that are selected as good answers for Q. S is iteratively populated by computing MMR as described in the formula, where Di and Dj are retrieved documents, and Sim_1 and Sim_2 are similarity functions.

MMR for selecting tags dynamically

When λ = 1, MMR computes a standard, relevance-ranked list, whilst when λ = 0 it computes a maximal diversity ranking of the documents in R. In our setting Sim_1 is our ranking score while Sim_2 is the cosine similarity between word embedding vectors, and λ is set to be 0.5 for equal importance.

Section 4.6 Additional Features

Aspect-level sentiment analysis tries to predict the polarity of all aspects within one review while classical sentiment analysis only predicts one polarity for the whole review. For example, “I like the food here but the environment is terrible”, the aspect level sentiment analysis will give sentiment analysis for environment and food separately, making more concise sentiment analysis.

The tag hierarchy is established by combining neural topic modeling and graph-based keyword extraction together. For example, “good quality” and “bad quality” are talking about the same property of products, which are grouped together as “quality related”. “Quick delivery” and “terrible packaging” are talking about the same topic of “customer service”, and so they are grouped under this topic. By combining different extraction methods in multiple granularities, three layers of tag tier are established.

Section 5 Conclusion

We’re recruiting


  1. Extracting Product Features and Opinions from Reviews
  2. Movie Review Mining and Summarization
  3. TextRank: Bringing Order into Texts
  4. TopicRank: Graph-Based Topic Ranking for Keyphrase Extraction
  5. Simple Unsupervised Keyphrase Extraction using Sentence Embeddings
  6. Corpusindependent Generic Keyphrase Extraction Using Word Embedding Vectors
  7. An unsupervised aspect-sentiment model for online reviews
  8. Aspect extraction through semi-supervised modeling
  9. An Unsupervised Neural Attention Model for Aspect Extraction
  10. Extracting Opinion Targets and Opinion Words from Online Reviews with Graph Co-ranking

THG Tech Blog

THG is one of the world’s fastest growing and largest online retailers. With a world-class business, a proprietary technology platform, and disruptive business model, our ambition is to be the global digital leader.


    Written by

    Data Scientist @ THG, We’re recruiting —

    THG Tech Blog

    THG is one of the world’s fastest growing and largest online retailers. With a world-class business, a proprietary technology platform, and disruptive business model, our ambition is to be the global digital leader.