Find answers with codequestion 2.0

Semantic search for developers

David Mezzetti
NeuML
4 min readOct 6, 2022

--

We’re excited to announce the release of codequestion 2.0! This comes on the heels of the txtai 5.0 release. Both of these projects are open-source (Apache 2.0) and available on GitHub.

codequestion is a semantic search application for developer questions. Developers typically have a web browser window open while they work and run web searches as questions arise. With codequestion, searches run locally, no network connection is required. This application executes similarity queries to find similar questions to the input query.

codequestion 2.0 brings a number of exciting new features and changes. This article will summarize it all.

New models

There have been a lot of changes in the machine learning space over the last two years. When codequestion 1.0 was released, the performance, speed and accuracy of word vector models made it the best fit. The article below is a recap of that.

Transformers models have taken off 🚀 since. Back when codequestion 1.0 was released, Hugging Face Transformers had 20K stars. As of October 2022, it has 70K+ stars! Generating high quality sentence embeddings is now possible thanks to the great models built with the Sentence Transformers library. Look at all these models available on the Hugging Face Hub for sentence similarity.

With 2.0, the Mean Reciprocal Rank (MRR) improved from 77.1 to 85.0 for the Stack Exchange dataset. This is a significant increase, leading to more relevant results.

https://github.com/neuml/codequestion#model-accuracy

On top of that, the models have a broader and more general knowledge base as shown with the Pearson Correlation scores for the STS dataset.

https://github.com/neuml/codequestion#model-accuracy

In fairness, the word vector models were only trained on a small dataset. So the score is actually impressive given that but still far lower than the Sentence Transformer model.

Semantic Graph Integration

This release integrates txtai 5.0, which added a new component, the semantic graph. Semantic graphs add topic modeling and graph path traversal capabilities.

Typing in a .topics command displays the top N topics found. This command can also filter using a query. Running .topics python shows the top topics covering Python.

Additionally, path traversal is available with the .path command. This command takes two arguments, a start id and end id. Path traversal is a useful technique to evaluate the connectivity of a dataset. It’s also highly interesting! Running .path 219570 167341 shows how the graph’s relationships can be traversed to connect two different questions.

Visual Studio Code

A codequestion prompt can be started within Visual Studio Code. This enables asking coding questions right from your IDE.

Run Ctrl+`to open a new terminal then type codequestion to get started.

API Service

codequestion builds a standard txtai embeddings index. As such, it supports hosting the index via a txtai API service. This service returns results as JSON and can be integrated into an external application. Read this link out to learn more.

The Journey Here

NeuML, the company behind txtai and codequestion, was founded right when codequestion 1.0 was released in January 2020. This release is the culmination of the last two years of development on txtai and how things have come full circle.

At that time, neuspo was also released.

Then as we all know, COVID-19 hit and the whole world changed. NeuML applied the ideas from codequestion and neuspo towards COVID-19 research. This led to multiple awards in the Kaggle COVID-19 challenge.

Semantic graphs were first dreamed up with neuspo, semantic search with codequestion. The work on the COVID-19 challenge led to paperai.

txtai 1.0 was released in August 2020 and work has been ongoing since to consolidate the ideas from all these projects into a single framework for building semantic search applications. codequestion 2.0 is a major milestone on that journey.

Wrapping up

This article gave an overview of codequestion 2.0 along with the journey here. Thank you for reading! See the following links for more information.

--

--

David Mezzetti
NeuML

Founder/CEO at NeuML. Building easy-to-use semantic search and workflow applications with txtai.