NeuML — 2022 Year in Review

Recapping 2022 and looking ahead to 2023

David Mezzetti
NeuML
9 min readJan 4, 2023

--

NeuML builds open-source, easy-to-use, semantic workflow applications. We bridge the gap between research and production. We also provide consulting services around our open-source stack.

NeuML continued to build a robust open-source semantic search stack in 2022. We also strengthened our projects via a series of consulting efforts. This article will recap progress made in 2022 and look ahead to 2023.

txtai

txtai is an open-source platform for semantic search and workflows powered by language models. This is the foundational piece of software that all of our work stands on.

Highlights for txtai in 2022:

  • 1,748 stars on GitHub to bring the total to ⭐3,130
  • 385 total commits on GitHub
  • 179 total issues resolved on GitHub
  • 12 releases. Entered the year at v3.7.0 and finished at v5.2.0
  • 16 articles and example notebooks added

Let’s recap the major functionality released.

txtai 4.0

txtai 4.0 was released in January 2022. This release added content storage, SQL queries, object storage, reindexing and index compression.

Content storage saves additional metadata alongside vector indexes. This enables expressive queries with SQL and pulling back additional fields at query time. Up until 4.0, txtai queries strictly ran vector searches for a snippet of text. With this release, SQL statements filtering on other fields are now possible. Reindexing is also possible when content storage is enabled, leading to faster rebuilds with new settings.

In addition to content storage, object storage was also added. This enabled binary content storage (audio/images/video/documents). Compressed indexes were also added (zip, gzip and bzip2) which significantly reduces storage requirements.

It’s hard to believe that this functionality didn’t exist until 2022 given how much is built on it now.

A full recap on 4.0 is available in the article below.

txtai 5.0

txtai 5.0 was released in September 2022. This release added semantic graphs and external integrations.

Semantic graphs, also known as knowledge graphs or semantic networks, build a graph network with semantic relationships connecting the nodes. In txtai, they can take advantage of the relationships inherently learned in an embeddings index. This opens exciting possibilities for exploring relationships, such as topics and interconnections in a dataset. Semantic graphs in txtai can be used for topic modeling, graph traversal and analysis.

External integrations added hooks to plug in vector, content, graph and vectorization backends. For example, Weaviate can be used to store vectors instead of Faiss, PostgreSQL instead of SQLite, Neo4j instead of NetworkX and OpenAI instead of local Transformers models. This adds the flexibility to scale out different parts of txtai as your system grows.

A full recap on 5.0 is available in the article below.

Community

txtai has a wonderful and growing community. One of the best parts of doing this is seeing others use txtai in new and innovative ways. Everyone in software development has benefited from open-source and it’s great to be able to give something back.

Highlights from around the community in 2022.

Community articles, projects and presentations

Shared by Philip Vollet on LinkedIn and Twitter

How to explain the prediction of semantic search? (Mohit Mayank)

Open source smart glasses (Cayden Pierce)

OpenBBTerminal language translation

txtai used by two of winning solutions in Hackathon

Video tutorial for txtai (Dr. WJB Mattingly)

Search Shakespeare with txtai (Dr. WJB Mattingly)

Meetup presentation at PyBerlin (Gerald Schuller)

Semantic search system with txtai and Qdrant (Kacper Lukawski)

Community maintained Conda package for txtai (Matt Chan)

Machine translation for Medical Chat (Keith Trnka)

Submission to EACL (Allen Roush)

Build A Semantic Search System With txtai And Weaviate (Mohd S)

Neural Search Frameworks (Dmitry Kan)

Trending posts

Hacker News: July, November, and November

Reddit: July, November and November

GitHub: January and April

Other: LinkedIn and Twitter

Thank you to everyone who contributed this year including those who submitted Pull Requests/Issues on Github, reached out on Slack and anyone else we missed!

Plan for 2023

Looking ahead to 2023, txtai will undertake a number of exciting initiatives.

Going small
In the type A tech world, the mantra is often go big or go home. Many aspire or think they need to develop solutions at FAANG scale. Investing time in analyzing datasets often shows only a small percentage of data is even relevant.

Many don’t need this scale and txtai wants to make it easy to build small to medium systems with easy-to-enable options to scale up as needed. Add complexity when necessary not by default.

Micromodels
Same theme here, going small. This time going small means running models on limited-resourced systems like microcontrollers. Some work was done in 2022 with Docker images added for ARMv8, which can be run on large single board devices such as the Raspberry Pi 4.

Can we create vector and similarity search models under 1MB for limited-resourced devices? Stay tuned!

Generative semantic search
The progress made in generative machine learning in 2022 was phenomenal. Stable diffusion, DALL-E 2 and ChatGPT led the way. Semantic search has a place here, especially in generating knowledge-grounded content. This will be a focus in the coming year.

Multimodal indexing
Work will be done this year to add flexibility with how txtai indexes content. This will lead to easier-to-build multimodal indexes.

Cloud offering
While installing txtai is as simple as pip install txtai and an easy-to-use Docker model is available, it still requires a time investment. Adding a cloud offering enables rapid development, especially for those with small and/or overloaded technical teams.

This will also be beneficial to NeuML’s consulting services, giving an additional option to host deliverables.

paperai

paperai is a semantic search and workflow application for medical/scientific papers. It helps automate tedious literature reviews allowing researchers to focus on their core work.

paperai had the following highlights for 2022:

  • 125 stars on GitHub to bring the total to ⭐820
  • 44 total commits on GitHub
  • 15 total issues resolved on GitHub

The most popular project that uses txtai is paperai. On top of that, the majority of our consulting efforts to date are centered around paperai. paperetl is a companion project and has the raw processing logic for extracting content from medical/scientific papers.

In 2022, paperai and paperetl both had a major 2.0 release. This release added support for new datasources and better report generation.

Looking ahead to 2023, paperai and paperetl will continue to have a couple releases a year. There will be work to make it easier to get started.

Other Projects

Most of our efforts in 2022 consolidated around txtai and paperai. But there were contributions to other downstream txtai projects as discussed below.

codequestion

codequestion is a semantic search application for developer questions.

Developers typically have a web browser window open while they work and run web searches as questions arise. With codequestion, this can be done from a local context. This application executes similarity queries to find similar questions to the input query.

codequestion had the following highlights for 2022:

  • 122 stars on GitHub to bring the total to ⭐383
  • 50 total commits on GitHub
  • 8 total issues resolved on GitHub

codequestion had a major 2.0 release in 2022. This release took full advantage of the features available in txtai 5.x including content storage and semantic graphs. The default model is now a Transformers model vs a Word Vectors model.

Looking ahead to 2023, the only work expected is to integrate major new features from txtai.

tldrstory

tldrstory is semantic search application for headlines and text content related to stories.

tldrstory had the following highlights for 2022:

  • 54 stars on GitHub to bring the total to ⭐298

tldrstory wasn’t a priority and no releases were made in 2022. Looking ahead to 2023, there is still an open task to upgrade the project to take full advantage of txtai 5.x features, like what was done with codequestion.

neuspo

neuspo was started with a commitment to discovering objective, descriptive and real-time sports information. neuspo is running and receives routine maintenance. No major updates are planned for 2023.

Consulting Services

NeuML provides consulting services around our open-source stack as follows:

  • Custom Semantic Applications Build custom semantic applications with the txtai stack
  • Model Development Create AI, Machine Learning and/or NLP models that excel in industry-specific domains
  • Advisory and Strategy Support Leverage our expertise to plan your data and AI strategy
  • Cloud-native Workflows Scalable NLP workflows for semantic search, summarization, translation and more
  • AI-driven Literature Review Automate reviews of large-scale unstructured medical, scientific and research literature
  • Media Analytics Sentiment analysis, event discovery and trend detection

Our efforts in 2022 continued to be centered around txtai and paperai. Consulting work is symbiotic with our open-source projects, each helping to push the other ahead. These projects have been very interesting and fulfilling. It’s great to see txtai solving important real world problems in the medical domain. After all, our tagline is “Applying machine-learning to solve everyday problems”.

Looking ahead to 2023, we will continue to build on the success of our current projects. We will also ramp up our outreach efforts to demonstrate how txtai can be applied to other domains such as finance and marketing. We’ve just scratched the surface on what is possible!

Ways to find NeuML

The full list of ways to interact with NeuML is shown is below.

Contact us
Website | Email | Slack

Code
GitHub | Docker Hub | HF Spaces

Social Media
LinkedIn | Twitter | Facebook | YouTube

Articles
Medium | Hashnode | dev.to | Newsletter

Slack, Docker Hub, YouTube, Hashnode and the Newsletter (Substack) were all added in 2022!

Wrapping up

This article covered the state of NeuML’s projects and efforts in 2022 and plans for 2023.

As expected, our open-source work in 2022 consolidated around txtai. But there were new releases of downstream txtai projects. This pattern is expected to continue in 2023. There will be a number of exciting txtai initiatives in the coming year. The machine learning community is evolving fast and it’s guaranteed that other focus areas will sprout up!

Our consulting efforts are important to prove that our projects can solve real-world problems. These efforts are crucial in building a well-rounded company with a sustainable business model. We will continue to seek out ways to solve important everyday problems using our stack.

Thank you for reading. Please follow along and check in on how we’re doing over the course of 2023!

--

--

David Mezzetti
NeuML

Founder/CEO at NeuML. Building easy-to-use semantic search and workflow applications with txtai.