💡 What’s new in txtai 8.0

Agents come to txtai

David Mezzetti
NeuML
4 min readNov 18, 2024

--

txtai is an all-in-one embeddings database for semantic search, LLM orchestration and language model workflows.

The 8.0 release brings a major new feature: Agents 🚀

Agents automatically create workflows to answer multi-faceted user requests. Agents iteratively prompt and/or interface with tools to step through a process and ultimately come to an answer for a request.

This release also adds support for Model2Vec vectorization.

This article will briefly cover all the changes. A comprehensive version of this article with install instructions and corresponding notebooks can be found below.

Agents

The biggest change and reason this is a major release is the addition of agents. The following defines a basic agent.

from datetime import datetime

from txtai import Agent

wikipedia = {
"name": "wikipedia",
"description": "Searches a Wikipedia database",
"provider": "huggingface-hub",
"container": "neuml/txtai-wikipedia"
}

arxiv = {
"name": "arxiv",
"description": "Searches a database of scientific papers",
"provider": "huggingface-hub",
"container": "neuml/txtai-arxiv"
}

def today() -> str:
"""
Gets the current date and time

Returns:
current date and time
"""

return datetime.today().isoformat()

agent = Agent(
tools=[today, wikipedia, arxiv, "websearch"],
llm="hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4",
max_iterations=10,
)

This agent has access to two embeddings databases (Wikipedia and ArXiv) and the web. Given the user’s input request, the agent decides the best tool to solve the task.

agent(
"Which city has the highest population, Boston or New York?",
maxlength=16000
)
======== New task ========
Which city has the highest population, Boston or New York?
=== Agent thoughts:
Thought: I will use the tool 'web_search' to find the population of both cities.
>>> Calling tool: 'web_search' with arguments: {'query': 'Population of Boston and New York'}
=== Agent thoughts:
Thought: The results from the web search indicate that the population of Boston is 7,948 and the population of New York is 653,833. However, it seems that the question is asking about the city of Boston in New York, not the city of Boston in Massachusetts. I will use the tool 'wikipedia' to find the population of the city of New York.
>>> Calling tool: 'wikipedia' with arguments: {'query': 'Population of New York City'}
=== Agent thoughts:
Thought: The results from the Wikipedia search indicate that the population of New York City is approximately 8,804,190. However, this is the population of the entire city, not just the city of New York. I will use the tool 'wikipedia' to find the population of the city of New York, not the state of New York.
>>> Calling tool: 'wikipedia' with arguments: {'query': 'Population of New York City (borough)'}
=== Agent thoughts:
Thought: The results from the Wikipedia search indicate that the population of New York City is approximately 8,804,190, and the population of Brooklyn is approximately 2,592,149. Since Brooklyn is one of the five boroughs of New York City, I will assume that the population of the city of New York is approximately 8,804,190. However, this is still not the correct answer to the original question. I will use the tool 'wikipedia' to find the population of the city of Boston in Massachusetts, which is the other city mentioned in the original question.
>>> Calling tool: 'wikipedia' with arguments: {'query': 'Population of Boston, Massachusetts'}
=== Agent thoughts:
Thought: The results from the Wikipedia search indicate that the population of Boston, Massachusetts is approximately 675,647. This is the correct answer to the original question, which was to determine which city has the highest population, Boston or New York. Since the population of Boston is significantly lower than the population of New York City, which is approximately 8,804,190, the answer to the original question is New York City.
>>> Calling tool: 'final_answer' with arguments: {'answer': 'New York City'}
'New York City'

How about that 🔥?

Look at how the agent went through and systematically solved the problem. The agent ran a series of searches to gather the information it needed and then determined the final answer!

Vectorization with Model2Vec

Model2Vec is a technique to turn any sentence transformer into a really small static model, reducing model size by 15x and making the models up to 500x faster, with a small drop in performance.

This release adds support for Model2Vec models.

from txtai import Embeddings

# Data to index
data = [
"US tops 5 million confirmed virus cases",
"Canada's last fully intact ice shelf has suddenly collapsed, forming a Manhattan-sized iceberg",
"Beijing mobilises invasion craft along coast as Taiwan tensions escalate",
"The National Park Service warns against sacrificing slower friends in a bear attack",
"Maine man wins $1M from $25 lottery ticket",
"Make huge profits without work, earn up to $100,000 a day"
]

# Create an embeddings
embeddings = Embeddings(method="model2vec", path="minishlab/M2V_base_output")
embeddings.index(data)

uid = embeddings.search("climate change")[0][0]
data[uid]
"Canada's last fully intact ice shelf has suddenly collapsed, forming a Manhattan-sized iceberg"

Wrapping up

This article gave a quick overview of txtai 8.0. Updated documentation and more examples will be forthcoming. There is much to cover and much to build on!

See the following links for more information.

--

--

NeuML
NeuML

Published in NeuML

Articles and technical content from NeuML

David Mezzetti
David Mezzetti

Written by David Mezzetti

Founder/CEO at NeuML. Building easy-to-use semantic search and workflow applications with txtai.

Responses (1)