Relevant and Helpful AI Agents Leveraging Interaction Data

Published in

ThirdAI Blog

3 min readApr 25, 2024

Most AI agents today cannot leverage all the available information. For instance, all applications hoping to leverage GenAI almost always have past customer interactions of the system with their human representatives, in the form of past interaction logs.

We will see how we can leverage all the interaction data to build the most relevant and helpful GenAI Agent using ThirdAI.

A simple use case: Stack Overflow

Let's explore a straightforward scenario utilizing ThirdAI alongside user-friendly APIs. We'll work with a dataset of 20,000 answers sourced from Stack Overflow, organizing them for efficient retrieval based on user queries.

Below is an excerpt from our dataset. As we can see the data contains a rich set of information: the title of Stack Overflow posts, queries, answers, and scores indicating user upvotes.

Key Metrics for Comprehensive Assessment

We’ll use two metrics to assess our model, helpfulness and relevance.

Helpfulness represents a model’s ability to predict the highest scoring answer from a pool of correct answers. This metric is pivotal for applications where delivering the best possible answer directly impacts user engagement. A high helpfulness score indicates that the model is adept at recognizing and prioritizing the answers that users find most valuable.

Relevance on the other hand, assesses the model’s capability to provide answers that are related to the input query, without necessarily being the top-scored answer. This distinction is crucial because an answer can be relevant but not the most helpful one according to the scores. Relevance ensures that the model understands the query’s context and can match it with appropriate answers.

By distinguishing between helpfulness and relevance, we can better understand a model’s strengths and areas for improvement. A model might be excellent at identifying relevant answers but fall short in discerning which of those answers is most helpful.

Leverage Unsupervised Text (Step 1/3)

We index the answers and titles into our system.

db = ndb.NeuralDB()
csv_file = ndb.CSV(path="train.csv",
                  id_column="id",
                  strong_columns=['title', 'answer'],
                  weak_columns=[],
                  reference_columns=['answer'])
db.insert([csv_file])

Customization: Associate Questions with Answers (Step 2/3)

Next, we use question-answer pairs as supervised data to train the index: with this step we are able to achieve a relevance of 75.8% and helpfulness of 63.8%.

sup_data = ndb.Sup(
            train_file,
            query_column="query",
            id_delimiter="",
            id_column="id",
            source_id=source_ids[0],
        )
db.supervised_train([sup_data])

Customization: Use Upvote Information (Step 3/3)

Finally, we incorporate user behavior data, such as upvotes, to further refine the system:

Upvote allows a user to give feedback for certain answers over the others, essentially signaling which responses they found most useful.

We now show that after up-voting the model on a set of (query, answer_id) pairs, where answer_id corresponds to the highest scoring answer for that query, we can achieve a significant boost in helpfulness of the model.

for (query, answer_id) in upvote_data:
  db.text_to_result(query, answer_id)

This further improved the helpfulness of our system from 63.8% to 90.4%, and relevance from 75.8 to 95.4%.

Explore the complete notebook here to see the seamless integration and efficiency of Neural DB in action.

Results of Using Interaction Data

We highlight the effectiveness of using all data by comparing it with popular alternative ChromaDB. The comparisons are summarized below

Clearly, AI systems that incorporate user interaction could potentially show significant improvements in both relevance and helpfulness.

IMPORTANT LINK: https://github.com/ThirdAILabs/Demos/blob/stackoverflow_demo/neural_db/examples/stackoverflow_demo/neuraldb.ipynb