Member-only story

How to Use Pre-Trained Language Models for Regression

Why and how to convert mT5 into a regression metric for numerical prediction

Aden Haussmann
Towards Data Science
7 min readJan 18, 2025

--

Screenshot of https://huggingface.co/google/mt5-large

Introduction

My undergraduate honour’s dissertation was a Natural Language Processing (NLP) research project. It focused on multilingual text generation in under-represented languages. Because existing metrics performed very poorly on evaluating outputs of models trained on the dataset I was using, I needed to train a learned regression metric.

Regression would be useful for many textual tasks, such as:

  • Sentiment analysis: Predict the strength of positive or negative sentiment instead of simple binary classification.
  • Writing quality estimation: Predict how high the quality of a piece of writing is.

For my use case, I needed the model to score how good another model’s prediction was for a given task. My dataset’s rows consisted of the textual input and a label, 0 (bad prediction) or 1 (good prediction).

  • Input: Text
  • Label: 0 or 1
  • The task: Predict a numerical probability between 0 and 1

But transformer-based models are usually used for generation tasks. Why would you use a pre-trained LM for…

--

--

Towards Data Science
Towards Data Science

Published in Towards Data Science

Your home for data science and AI. The world’s leading publication for data science, data analytics, data engineering, machine learning, and artificial intelligence professionals.

Aden Haussmann
Aden Haussmann

Written by Aden Haussmann

Associate Product Manager @ Google. Views expressed in stories are my own, not my employer's. I write about product management, tech and AI.

Responses (1)