Analyzing Text Classification Model Outputs In Python

Published in

Geek Culture

3 min readMar 27, 2022

This post is the continuation of the previous post, where we looked at two ways to build a text classifier based on pre-trained open-source models. In reality, although a good model accuracy is essential, understanding the model outputs are equally crucial.

This post will explore how we can obtain some insights from the pre-trained ‘distilbert-base-uncased’ model. We will be using the model both as the tokenizer and text classifier for the dataset.

Building Two Simple Text Classifiers With Python

In my previous post, we looked at ways to utilize the open-source Huggingface transformers for various Natural Language…

medium.com

Let’s begin by loading the Huggingface dataset, setting our device where the torch tensor will be allocated, tokenizing the dataset and loading the pre-trained model.

# Loading a HF datasetfrom datasets import load_datasetdataset = load_dataset('poem_sentiment')# Tokenizing the datasetfrom transformers import AutoTokenizermodel_ckpt = 'distilbert-base-uncased'
tokenizer = AutoTokenizer.from_pretrained(model_ckpt)def tokenize(batch)…

Analyzing Text Classification Model Outputs In Python

Building Two Simple Text Classifiers With Python

In my previous post, we looked at ways to utilize the open-source Huggingface transformers for various Natural Language…

Written by LZP Data Science