Using LLMs to Auto Label Shopify content

Published in

The Deep Hub

3 min readFeb 15, 2024

Effortless way to Auto Label content with Hugging Face: Effortless, Easy, Data-Free

Introduction

How can one obtain multi-label outputs for a text? Traditionally, the process involves gathering a substantial amount of data, annotating it, training the data on a GPU, testing it, fine-tuning, and re-testing to obtain the desired results. It’s a tedious process to quickly check data for let’s say classifying a Shopify shop, sentiment analysis, or song genre detection, among others.

However, there is now a solution that streamlines this entire process, making it significantly more efficient and hassle-free. With just a simple Colab or Jupyter Notebook setup, users can input data and receive instant output. Hugging Face comes to the rescue with its zero-shot models, eliminating the need for extensive training and providing a straightforward way to classify data.

Background

In simple terms, the zero shot classification model does classify the provided data even if it never ever seen the classes before. It was able to achieve this by previous knowledge, the model acquired while training.

Use-Case

I thought it would be really helpful to share a real example with the community. This way, it’s easier to relate to instead of me just providing code directly.

I have content from various Shopify shops. This will include random details such as shop names, shop descriptions, and lists of collections within each shop. The bonus is that it also supports multiple languages, eliminating the need for a translator.

Let get digging

Code walkthrough

Lets assign the device with the available device either gpu or cpu

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

Importing the pipeline form transformers

Initializing the zero short classification pipeline using BART model from facebook along with the above given device specs

from transformers import pipeline
classifier = pipeline("zero-shot-classification", device=device,
                      model="facebook/bart-large-mnli")

shop_labels is for the model has to look at a short list of specific labels, sort of like choices, and use them to decide which category each shop’s details belong to.

shop_labels = ['eye', 'gel', 'sea', 'box', 'print', 'food', 'appliance', 'earring', 'fitness', 'scooter', 'baby', 'hoodies', 'stone', 'car', 'bandana', 'pant', 'hand', 'single', 'treatment', 'battery', 'pond', 'sustainable', 'supplement', 'footwear', 'iphone', 'jewelry', 'blanket', 'bottom', 'bath',  'gadget', 'toy', 'cable', 'gift', 'camera', 'bag', 'tool', 'puzzle', 'electric', 'tee', 'fire', 'wireless', 'audio', 'leather', 'hat', 'bracelet', 'short', 'bridal', 'deck', 'para', 'security', 'aquarium', 'board', 'ceramic', 'denim', 'home', 'bbq', '3d', 'book', 'kid', 'sock', 'incense', 'australian', 'face', 'outdoor', 'apparel', 'patio', 'festival', 'decor', 'business', 'dog', 'indoor', 'boutique', 'hair', 'pit', 'wooden', 'wall', 'clothing', 'bike', 'shirt', 'party', 'handmade', 'red', 'mask', 'grill', 'wheel', 'mirror', 'vegan', 'filter', 'art', 'fragrance', 'spray', 'kit', 'furniture', 'lighting', 'charger', 'nail', 'chocolate']

Here’s an overview of the different columns in the dataframe.

The classifier produces various labels, but I’m only interested in seeing the top-performing labels, so I’ve capped the score to the top 95%.

def top_tags(row):
  # print(row)
  name_desc_collection = row.store_name + ' ' + row.store_description + ' ' + row.store_collection
  result = classifier(name_desc_collection, shop_labels, multi_label=True)
  return [label for label, score in zip(result['labels'], result['scores']) if score > .95]

df['output'] = df.apply(top_tags, axis=1)

The output appears relevant based on the provided data and labels. I’ve filtered out the top-performing results, which seem interesting and relevant. I’ve also uploaded the input and output file on GitHub, so you can refer to it for more detailed information about the content and keywords in it.

Conclusion

Well done, zero-shot classifier! The results are fascinating. Overall, its label accuracy is spot-on, and the best part is that all the resources we used are completely free. Feel free to give it a try yourself, and don’t hesitate to share your comments if you encounter any issues with the code.