Getting started with semantic workflows

A guide on when to use small and large language models

David Mezzetti
NeuML

--

Semantic workflows transform and find data driven by user intent. Workflows combine machine learning models together to create powerful transformation and processing functions.

With ChatGPT bursting onto the scene, the potential of large language models (LLMs) has captured the public’s imagination. There has been heavy interest in discovering ways LLMs can be applied towards a multitude of tasks. This has also inspired new projects and frameworks to connect LLMs with other data sources and tasks to build AI-driven applications.

There is a rich ecosystem of machine learning models big and small that can be effective. This article explores when it’s best to use small to medium models chained together and when large language models (LLMs) are a better fit. It will also introduce how to join models together with workflows and hosting workflows as an API service.

Examples are backed by txtai, an open-source framework for building semantic search applications. See the link below for more on txtai.

Transformers models to transform data

The release of BERT in 2018 kicked off a number of rapid advancements in Natural Language Processing (NLP). Fine-tuned versions of BERT were able to label text, label tokens and run extractive question-answering.

A flurry of models were released over the next couple of years to summarize text, translate text, transcribe speech to text and even understand images.

While GPT-3 was released in 2020, it was just one player in a crowded field of models. ChatGPT brought to light the power of LLMs for many.

While LLMs are versatile and can accomplish many tasks, that doesn’t mean it’s always the best choice. For example, consider the task of summarizing text. While a LLM can do that, there are smaller models built specifically to do that and do it well. Time to explore with code!

Summarization

The first task we’ll look at is summarization. See the code below that uses a summary pipeline to run a machine learning model and shorten the text into a summary.

from txtai.pipeline import Summary

# Create summary model
summary = Summary("philschmid/flan-t5-base-samsum")

text = """
Search is the base of many applications. Once data starts to pile up, users
want to be able to find it. It’s the foundation of the internet and an
ever-growing challenge that is never solved or done. The field of Natural
Language Processing (NLP) is rapidly evolving with a number of new
developments. Large-scale general language models are an exciting new
capability allowing us to add amazing functionality quickly with
limited compute and people. Innovation continues with new models and
advancements coming in at what seems a weekly basis. This article
introduces txtai, an AI-powered search engine that enables Natural
Language Understanding (NLU) based search in any application.
"""

summary(text)

The section above prints:

txtai is an AI-powered search engine that enables Natural Language 
Understanding (NLU) based search in any application.

Now let’s do the same thing with a large language model. The sequences pipeline runs text to text generation. In other words, it takes input text, passes it to a model and creates transformed output text.

In this article, we’re going to use FLAN-T5-LARGE as our large language model. This can easily be substituted with an OpenAI GPT-3, Cohere or Hugging Face API call.

from txtai.pipeline import Sequences

sequences = Sequences("google/flan-t5-large")

prompt = """
Summarize the following paragraph into a single sentence.
"""

sequences(f"{prompt}\n{text}")

This section prints:

txtai is an AI-powered search engine that enables Natural Language
Understanding (NLU) based search in any application.

Same output as before. Notice both models are from the same family but one is a smaller model fine-tuned specifically for summarization while the other is a generic larger model. Let’s see how many parameters each model has.

def modelsize(model):
params = sum(p.numel() for p in model.parameters() if p.requires_grad)
return int(params / 1024 / 1024)

params = modelsize(summary.pipeline.model)
print(f"Number of parameters in Summary model: {params}M")

params = modelsize(sequences.pipeline.model)
print(f"Number of parameters in Sequences model: {params}M")
Number of parameters in Summary model: 236M
Number of parameters in Sequences model: 746M

The prompt-based approach requires a model three times as big but both produce the same output! And by LLM standards 746M is a small model. InstructGPT for example is 1.3 billion parameters. FLAN-T5-XL is 3 billion parameters. GPT-3 is 175 billion parameters.

Summarize and Translate

For the next task, we’ll once again summarize text but this time also translate the summary to French. We’ll reuse the summary pipeline we created earlier.

from txtai.pipeline import Translation

# Summarize text using summary pipeline
output = summary(text)

# Translate to French
translate = Translation()
translate(output, "fr")

This section prints.

txtai est un moteur de recherche alimenté par l'IA qui permet une recherche 
basée sur la compréhension du langage naturel (NLU) dans n'importe quelle
application.

This is an accurate translation. Let’s try the same thing with our LLM. As with the summary pipeline, we’ll reuse the sequences pipeline.

prompt = """
Summarize the following paragraph into a single sentence summary.
Translate the single sentence summary to French.
"""

sequences(f"{prompt}\n{text}")

Running this prints:

txtai, un moteur de recherche en utilisant les méthodes de l'AI, permet de
trouver les données à l'aide de l'analyse de langues naturelles (LN) dans
toutes les applications.

Which is also a relatively accurate translation although the direct translation model appears to do a better job and requires significantly less resources.

Let’s calculate the total number of parameters.

params = modelsize(summary.pipeline.model) + 
modelsize(translate.models[list(translate.models.keys())[0]][0])
print(f"Number of parameters in Summary + Translation models: {params}M")

params = modelsize(sequences.pipeline.model)
print(f"Number of parameters in Sequences model: {params}M")
Number of parameters in Summary + Translation models: 307M
Number of parameters in Sequences model: 746M

The summary and translation models combined are still less than half the number of parameters. While the number of parameters isn’t the only factor, it does show that multiple models is the better option in this case. And on top of it, it’s more accurate.

Workflows

Having to continually manage and connect multiple pipelines can get messy. txtai has a robust workflow framework that makes managing this easy.

Workflows are a simple yet powerful construct that takes a callable and returns elements. Workflows are streaming by nature and work on data in batches, allowing large volumes of data to be processed efficiently.

The following article gives an overview of txtai workflows.

Workflows can be created in either Python or YAML. For this article, we’ll create YAML configuration.

summary:
path: philschmid/flan-t5-base-samsum
translation:
workflow:
summary:
tasks:
- action: summary
- action: translation
args:
- fr

This workflow YAML configures a summary pipeline and translation pipeline. It then builds a workflow that connects the pipelines together.

from txtai.app import Application

app = Application("app.yml")
print(next(app.workflow("summary", [text])))

Running this prints.

txtai est un moteur de recherche alimenté par l'IA qui permet une recherche
basée sur la compréhension du langage naturel (NLU) dans n'importe quelle
application.

Just like the code we ran in Python!

Indexing Workflow

Now let’s build an indexing workflow. This workflow will read the front page of Hacker News and create a vector index for the titles. From there, we’ll use embeddings-guided and prompt-driven search for those titles.

Here’s the workflow definition.

embeddings:
path: sentence-transformers/all-MiniLM-L6-v2
content: true
extractor:
path: google/flan-t5-large
tabular:
idcolumn: url
textcolumns:
- title
workflow:
index:
tasks:
- batch: false
extract:
- hits
method: get
params:
tags: null
task: service
url: https://hn.algolia.com/api/v1/search?hitsPerPage=50
- action: tabular
- action: index
writable: true

This workflow has a bit more going on than the last one. First it defines an embeddings index for vector search. Then it defines an extractor pipeline which will be used for prompt-driven search. It also defines a tabular pipeline which is used to parse JSON data.

After that, an indexing workflow is defined. This workflow reads the front page of Hacker News, parses the content and loads the titles into the embeddings index.

Now let’s define our code. Like last time, we’ll load the workflow with an Application. We’ll then run the indexing workflow and build the index.

from txtai.app import Application

app = Application("app.yml")
results = list(app.workflow("index", ["front_page"]))

Now let’s ask some questions. The next section asks a series of questions using prompt-driven search.

def prompt(question):
return f"""Answer the following question using the context below. Say 'no answer' when the question can't be answered.
Question: {question}
Context: """

def ask(question):
return question, app.extract([{"name": "question", "query": question, "question": prompt(question)}], None)[0]["answer"]

question = "What happened to the Chinese balloon?"
print(ask(question))

question = "How low can you go on RAM for Windows 11?"
print(ask(question))

question = "What microcontroller is being discussed?"
print(ask(question))

question = "What is the temperature in Virginia?"
print(ask(question))

Running this prints.

('What happened to the Chinese balloon?', 'U.S. military shoots down suspected Chinese surveillance balloon')
('How low can you go on RAM for Windows 11?', '2GB')
('What microcontroller is being discussed?', 'ESP32')
('What is the temperature in Virginia?', 'No answer')

Very interesting! Notice how more details are given based on how the question is asked. And when the question can’t be answered, the model says so.

The ad-hoc nature of user questions makes a prompt-based approach the right model for the job. LLMs have a great deal of language understanding and are flexible in handling a variety of user-generated questions.

If you’d like to learn more on how this approach works, see the article below.

Running as an API Service

Workflows can also run as an API service. Docker files are available to get up and running quickly. See this page to for more.

The following command starts an API service for the app.yml file defined in the previous section.

$ CONFIG=app.yml uvicorn "txtai.api:app"

txtai has API libraries for Go, Java, JavaScript and Rust. To keep things simple, we’ll use the requests library to run the workflow and a query.

import requests

requests.post(
"http://localhost:8000/workflow",
json={"name": "index", "elements": ["front_page"]}
)

The workflow is complete and now it’s time to run a query. We’ll re-use the prompt function to build the request parameters.

question = "What happened to the Chinese balloon?"
queue = [{
"name": question,
"query": question,
"question": prompt(question)
}]

# Run query
response = requests.post(
"http://localhost:8000/extract",
json={"queue": queue})

print(response.json())

Running this prints.

[{
"name": "What happened to the Chinese balloon?",
"answer": "U.S. military shoots down suspected Chinese surveillance balloon"
}]

Same result, this time it’s with an API call 🚀

Wrapping up

This article evaluated use cases for small and large language models. For tasks such as summarization and translation, there are specialized smaller models that work well. These models can be connected together with workflows and still have a smaller footprint than a LLM. In this case, there is limited variation with the task. Every time input is presented, it is to be summarized and/or translated.

Large language models are great when working with user inputs or other ad-hoc commands that are more abstract in nature. In this article, we used a LLM for embeddings-guided and prompt-driven search.

Building prompts with natural language and how LLMs can be instructed to perform tasks is amazing. But that comes with a cost regardless of whether it’s a local LLM or through an API call. Smaller models can be just as if not more effective when used right.

--

--

David Mezzetti
NeuML
Editor for

Founder/CEO at NeuML. Building easy-to-use semantic search and workflow applications with txtai.