Member-only story

How to use Large Language Models to tag your data: A complete tutorial

Using Mistral for Data tagging

Research Graph
7 min readMay 13, 2024
Source: Generated using DeviantArt’s DreamUp

Author

· Xuzeng He (ORCID: 0009–0005–7317–7426)

Introduction

Data tagging, in simple terms, is the process of assigning labels or tags to your data so that they are easier to retrieve or analyse. For example, when you are dealing with a database consisting of scientific journals, you may want to tag these documents with their relevant topics so that users can later easily find the journal they are interested in using some filter button without too much effort. To make things better, with the surge of Large Language Models (LLMs) nowadays (e.g. ChatGPT), one can now use them to tag huge amounts of data as long as you can deploy these models on your local computer.

In this post, we will show you how to use a popular large language model called Mistral to tag out a list of documents (in JSON format) from PubMed whose topics are related to Artificial Intelligence (AI) by inspecting their titles and abstracts.

Installation Guide

In this work, we use Ollama to install and run our LLMs locally and use Langchain to interact with our LLMs in a Python environment.

--

--

No responses yet