NeuML
Published in

NeuML

Getting started with semantic workflows

A guide on when to use small and large language models

Transformers models to transform data

Summarization

from txtai.pipeline import Summary

# Create summary model
summary = Summary("philschmid/flan-t5-base-samsum")

text = """
Search is the base of many applications. Once data starts to pile up, users
want to be able to find it. It’s the foundation of the internet and an
ever-growing challenge that is never solved or done. The field of Natural
Language Processing (NLP) is rapidly evolving with a number of new
developments. Large-scale general language models are an exciting new
capability allowing us to add amazing functionality quickly with
limited compute and people. Innovation continues with new models and
advancements coming in at what seems a weekly basis. This article
introduces txtai, an AI-powered search engine that enables Natural
Language Understanding (NLU) based search in any application.
"""

summary(text)
txtai is an AI-powered search engine that enables Natural Language 
Understanding (NLU) based search in any application.
from txtai.pipeline import Sequences

sequences = Sequences("google/flan-t5-large")

prompt = """
Summarize the following paragraph into a single sentence.
"""

sequences(f"{prompt}\n{text}")
txtai is an AI-powered search engine that enables Natural Language
Understanding (NLU) based search in any application.
def modelsize(model):
params = sum(p.numel() for p in model.parameters() if p.requires_grad)
return int(params / 1024 / 1024)

params = modelsize(summary.pipeline.model)
print(f"Number of parameters in Summary model: {params}M")

params = modelsize(sequences.pipeline.model)
print(f"Number of parameters in Sequences model: {params}M")
Number of parameters in Summary model: 236M
Number of parameters in Sequences model: 746M

Summarize and Translate

from txtai.pipeline import Translation

# Summarize text using summary pipeline
output = summary(text)

# Translate to French
translate = Translation()
translate(output, "fr")
txtai est un moteur de recherche alimenté par l'IA qui permet une recherche 
basée sur la compréhension du langage naturel (NLU) dans n'importe quelle
application.
prompt = """
Summarize the following paragraph into a single sentence summary.
Translate the single sentence summary to French.
"""

sequences(f"{prompt}\n{text}")
txtai, un moteur de recherche en utilisant les méthodes de l'AI, permet de
trouver les données à l'aide de l'analyse de langues naturelles (LN) dans
toutes les applications.
params = modelsize(summary.pipeline.model) + 
modelsize(translate.models[list(translate.models.keys())[0]][0])
print(f"Number of parameters in Summary + Translation models: {params}M")

params = modelsize(sequences.pipeline.model)
print(f"Number of parameters in Sequences model: {params}M")
Number of parameters in Summary + Translation models: 307M
Number of parameters in Sequences model: 746M

Workflows

summary:
path: philschmid/flan-t5-base-samsum
translation:
workflow:
summary:
tasks:
- action: summary
- action: translation
args:
- fr
from txtai.app import Application

app = Application("app.yml")
print(next(app.workflow("summary", [text])))
txtai est un moteur de recherche alimenté par l'IA qui permet une recherche
basée sur la compréhension du langage naturel (NLU) dans n'importe quelle
application.

Indexing Workflow

embeddings:
path: sentence-transformers/all-MiniLM-L6-v2
content: true
extractor:
path: google/flan-t5-large
tabular:
idcolumn: url
textcolumns:
- title
workflow:
index:
tasks:
- batch: false
extract:
- hits
method: get
params:
tags: null
task: service
url: https://hn.algolia.com/api/v1/search?hitsPerPage=50
- action: tabular
- action: index
writable: true
from txtai.app import Application

app = Application("app.yml")
results = list(app.workflow("index", ["front_page"]))
def prompt(question):
return f"""Answer the following question using the context below. Say 'no answer' when the question can't be answered.
Question: {question}
Context: """

def ask(question):
return question, app.extract([{"name": "question", "query": question, "question": prompt(question)}], None)[0]["answer"]

question = "What happened to the Chinese balloon?"
print(ask(question))

question = "How low can you go on RAM for Windows 11?"
print(ask(question))

question = "What microcontroller is being discussed?"
print(ask(question))

question = "What is the temperature in Virginia?"
print(ask(question))
('What happened to the Chinese balloon?', 'U.S. military shoots down suspected Chinese surveillance balloon')
('How low can you go on RAM for Windows 11?', '2GB')
('What microcontroller is being discussed?', 'ESP32')
('What is the temperature in Virginia?', 'No answer')

Running as an API Service

$ CONFIG=app.yml uvicorn "txtai.api:app"
import requests

requests.post(
"http://localhost:8000/workflow",
json={"name": "index", "elements": ["front_page"]}
)
question = "What happened to the Chinese balloon?"
queue = [{
"name": question,
"query": question,
"question": prompt(question)
}]

# Run query
response = requests.post(
"http://localhost:8000/extract",
json={"queue": queue})

print(response.json())
[{
"name": "What happened to the Chinese balloon?",
"answer": "U.S. military shoots down suspected Chinese surveillance balloon"
}]

Wrapping up

--

--

Articles and technical content from NeuML

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
David Mezzetti

Founder/CEO at NeuML — applying machine learning to solve everyday problems. Previously co-founded and built Data Works into a successful IT services company.