Advancing AI: From Information Retrieval to Large Language Models and Personalized Question Answering
“AI will be the defining technology of our time, just as the steam engine and electricity were for the Industrial Revolution.” — Satya Nadella, CEO of Microsoft.
In the rapidly evolving landscape of technology, Artificial Intelligence (AI) has emerged as a transformative force, poised to redefine our world much like the steam engine and electricity did during the Industrial Revolution. This essay embarks on a journey through the annals of AI history, tracing its trajectory from the fundamental question posed by Alan Turing to the present-day marvels of Large Language Models and personalized question-answering. As we delve into this exploration, we’ll witness the profound impact of AI on information retrieval and how it’s shaping the way we interact with technology.
From Foundations to Flourishing: Evolution of AI
Artificial Intelligence(AI) has become entwined with every facet of our lives, captivating the imaginations of scientists and researchers. It might seem like it is a recent development but, in reality, the groundwork started all the way back in the 1900s, with the progress made in the mid-20th century, when Alan Turing, a mathematician and computer scientist, proposed the question, “Can machines think?” and ever since there have been colossal advancements and breakthroughs, where it has been developed from a mere concept into becoming a potent tool, transforming the way we live, and work and reshaping the way to perceive and interact with technology.
An IR system operates by the collection of documents, such as web pages, articles etc, this is carried out by a component, namely the crawler or the Spider. The system creates an index using the documents collected, by making use of an indexer, which analyzes the documents, extracts relevant terms and maps terms to the documents in which they appear. When the user inputs a query, the query is processed and matched to the indexed documents, this process is carried out by Query Processor. The Ranking Algorithm determines the relevance of the documents to the given query and ranks them. These documents are then presented to the users.
Information Retrieval(IR) is a key technology which deals with the process of searching, obtaining and processing information the user is in need of, from a large knowledge base. This is done in the form of queries, where the user requests the required information in the form of a query. The representation of the documents is then compared to the query by using a similarity function and the relevant documents are extracted. IR involves the use of multiple specialized techniques, algorithms and models for the retrieval of information. The further evolution of NLP, deep learning and machine learning led to the emergence of Recommendation systems and Large Language Models(LLMs), tools that help people be more productive and efficient. LLMs such as GPT-3, BERT, etc are models that are trained using deep learning techniques on enormous data, which are capable of performing NLP tasks, while recommendation systems are tools that suggest relevant suggestions based on the user’s preferences. While they still have many limitations, these systems demonstrate promising in information retrieval, content generation and enhancing user experience.
Large Language Models (LLMs): Revolutionizing Language Understanding
Large Language Models, which are based on deep learning architectures, synthesize enormous amounts of text data and trained them to understand them and to generate information relevant to the original content. LLMs are trained on a corpus of data that is publicly available, the model then learns to predict and capture the patterns in the data. After the pre-training process, the LLMs are fine-tuned on the datasets to make them provide specific solutions, this helps the model adapt to the target and enhance the performance. The information captured during the pre-training is used to generate results according to the user’s preferences. These models are then iteratively refined to keep up with new language patterns and emerging information. While LLMs seem to look to Search Engines, LLMs possess the capability to understand and interpret the queries, which results in more precise and appropriate replies according to the behaviour of the user and the information provided.
OpenAI’s ChatGPT-3, a model built and trained on a colossal amount of data from the internet, gives it an understanding of languages and knowledge on various myriad topics. Although these capabilities seem impressive, sometimes they confidently generate incorrect answers that might appear convincing, hence it is necessary to verify these answers before accepting them as accurate. We consider if it is possible to detect LLM hallucinations with the help of an information retrieval system to retrieve supporting evidence, which is then checked against generated responses by the LLM itself. The author experiments to explore if the LLMs can self-detect hallucinations by confirming them against external corpus. And finds that with the help of information retrieval, it can identify its own hallucinations to an extent. The question is given to the LLM, which is then prompted for an answer. The generated answer is then combined with the original question to form a query, which is used to find the appropriate passage from the corpus, which will be called the retrieved answer. The question, generated answer and retrieved answer is now sent into the LLM to determine if the generated and retrieved solutions are consistent. The output of which could be of 3 types:
“YES” — No hallucination involved, the solutions are the same. “NO” — There is hallucination involved, the solutions are not the same. “NOT RELATED” — Both solutions are not related to the questions. For example, the LLM could give us a reply saying it needs more information to find the required solutions.
This type of study helps us enhance the reliability of AI-generated solutions by detecting and addressing the hallucination, helping improve the accuracy and reliability of the generated answers.
Case Study: Product Information Extraction with ChatGPT
While Retrieving Supporting Evidence for LLMs Generated Answers helps us understand the possible ways in which we could work on developing Large Language System, Product Information Extraction using ChatGPT becomes a case study where ChatGPT GPT-3 was used for Product Information Extraction. This paper focussed on the sage of ChatGPT to extract the Metadata of products, for e-commerce, from the title and description of the product, in addition, it focussed on
Evaluating the performance in comparison to pre-trained models such as BERT.
Evaluating the performance of open and closed extraction.
Exploring different prompt design
Evaluating the in-context learning for ChatGPT
This paper uses a subset of the MAVE dataset to test the performance of ChatGPT. The researchers use a subset of this dataset and perform the experiment, then compare the results to that of the baseline models such as AVEQA and NER(named entity recognition model), which are fully and extensively trained models that need extensive computational resources. This paper combines demonstrations with prompts to improve product information extraction and uses precision, recall and F1 score.
ChatGPT is given demonstrations(sample inputs and outputs which help the model in learning), after providing a task description, which provides an overview of the task. This helps in the learning of tasks for the model. After which the task description is passed, followed by the actual input for the model to give the appropriate response based on the learning.
Advancing Question Answering Systems
Question Answering has been a long-standing and critical problem in the field of Natural language Processing and Artificial Intelligence, and is considered to be a critical milestone in recent times. These systems enable users to express questions in Natural Language and get appropriate results immediately. The typical Question Answering Sytems contains the following:
- Data source,
- Information retrieval (IR) system,
- Machine reading comprehension model (MR model),
- One or more additional blocks, such as modules for text pre-processing, answers post-processing, checking, and stabilization.
One or more additional blocks, such as modules for text pre-processing, answers post-processing, checking, and stabilization.
When a question is put forward, the system first searches for the answer to the question from the text, and the sources which could be web pages, documents, Wikipedia etc. It, then, retrieves the most appropriate information front the Data Source, for which many algorithms can be employed.
The Standford Question Answering Dataset(SQuAD) was a widely used dataset in training and evaluating models to comprehend the passages of text and answer the questions based on the passages. A paragraph from the articles that are used as data sources are called PASSAGES, and they can be of any length. The SQuAD was made using a collection of Wikipedia pages that covered myriads of domains. The dataset consists of a set of questions from each passage and the answers to these questions can be found in the respective passage. One noteworthy fact about the dataset is that the answers are not limited to single-word responses or named entities and allow more diverse and complex answers.
QA systems face numerous challenges, such as understanding complex questions, context and providing accurate and reliable answers and various techniques and methods have been employed to address these challenges. The goal of these systems is to improve user experience by providing relevant and accurate answers to the questions put forth by the users, saving time. The continued improvements in the field of NLP, machine learning, and large-scale datasets like SQuAD have contributed to the development and improvement of QA systems making them more robust and efficient systems.
Traditional QA systems rely on only one source of knowledge, which often limits the capabilities of the system from fairly evaluating the answers. Hence, to overcome this limitation, [4] proposes the use of heterogenous sources, which improves the coverage and confidence by harnessing solutions from multiple sources such as databases, websites, documents etc for answering Factual questions. While numerous have been done using several existing datasets, they were primarily designed with the assumption that there exists a specific underlying source that contains nearly all the answers to the questions, which limits the usefulness of the QA systems.
In one such research, to analyze and assess the performance of the heterogeneous QA, the authors have designed a new Benchmarks called CompMix which consists of 9410 questions covering 5 different domains, split into train set (4,966), development set (1,680), and test set (2,764). By introducing CompMix, the authors aim to provide a more suitable testbed for evaluating the performance of heterogeneous QA systems in realistic scenarios where multiple sources are needed to obtain accurate answers.
- It is crowdsourced;
- It includes the full KB as one of the knowledge sources;
- It spans four sources for accurate answers.
- It covers five domains; and
- It contains self-contained complete questions.
Answers are from Wikidata which enables consistent evaluations.
Personalized Community Question Answering: SE-PQA
In the realm of information retrieval, the case study of SE-PQA (StackExchange — Personalized Question Answering) presents an illuminating exploration of personalized community question-answering systems. While preceding research has largely focused on fact-centric question-answering systems, SE-PQA ventures into the realm of community-driven personalized question-answering. This case study exemplifies the innovation that can emerge at the intersection of AI, community engagement, and deep learning.
Introduction and Dataset Creation: The case study commences by introducing the SE-PQA dataset, a novel collection crafted to enable personalized question answering within the StackExchange platform. This dataset capitalizes on the benefits of multiple community data sources for enhanced personalization. In a world where individual users demand results tailored precisely to their interests and behaviors, SE-PQA takes a pioneering step forward.
Deep Learning for Personalized Models: The study underscores the deployment of Deep Neural Networks (DNNs) to construct personalized models. Recognizing the absence of publicly available, high-quality datasets in this domain, the researchers embark on the journey of training models that cater to users’ unique preferences. The dataset, sourced from the bustling cQA platform StackExchange, consists of an impressive repository of 1 million questions and 2 million answers, including valuable insights like social interactions, votes, and topic descriptions.
Extracting Relevance: SE-PQA hinges on the concept of answer relevance, gauging it through community member upvotes. This mechanism enables the system to understand the weight of answers based on community consensus. By discarding questions without answers or answers with negative scores, the system ensures the quality and relevance of responses.
Personalization Unleashed: Two versions of the dataset, the base and personalized versions, serve as the foundation for personalization. In the base version, all positively-scored answers are deemed relevant, while the personalized version introduces a more precise criterion. Here, only the answer labeled as best by the user who posed the question is considered relevant, aligning responses more closely with individual needs.
Tailoring the User Experience: The personalization phase takes center stage, as SE-PQA tailors questions and answers to suit user preferences. User-generated data, ranging from past interactions to biographic information, fuels this phase. Through leveraging tags, earned badges, voting history, and more, the system gains an intricate understanding of user patterns, interests, and expertise areas. This customization enhances accuracy and user satisfaction.
A Two-Stage Ranking System: The case study integrates a two-stage ranking system, a dynamic blend of efficiency and effectiveness. This system adeptly navigates the balance between locating the most relevant document and optimizing query processing. It encapsulates the sophistication that underlies information retrieval, evolving with user needs and system intricacies.
Therefore, SE-PQA case study exemplifies the fusion of AI, community dynamics, and deep learning to forge personalized community question-answering systems. By leveraging deep neural networks, embracing data insights from StackExchange, and deploying multi-faceted relevance extraction, SE-PQA attains a new echelon of personalization. It amplifies user experiences, harnesses AI-driven insights, and propels the field of information retrieval into a future defined by tailor-made solutions.
Conclusion: Advancing Information Retrieval and Beyond
In conclusion, the evolution of information retrieval is relentless. Researchers are charting paths to precision, ethical integrity, and user-centricity. The future beckons with context-driven AI, neural breakthroughs, and fairness triumphs. Diverse domains and data fusion await exploration. As NLP thrives, challenges from biases to multilingualism pave the way for an AI age that empowers, adapts, and learns continuously. The journey is ongoing, promising a future where information retrieval thrives in harmony with humanity’s digital aspirations.
REFERENCES
- “What Is Artificial Intelligence (AI)?”, n.d., https://cloud.google.com/learn/what-is-artificial-intelligence
- Revolutionizing Information Retrieval: The Role of Large Language Models in a Post-Search Engine Era, 2023, https://medium.com/@daniele.nanni/revolutionizing-information-retrieval-the-role-of-large-language-models-in-a-post-search-engine-7dd370bdb62
- Retrieving Supporting Evidence for LLMs Generated Answers, 2023, https://arxiv.org/pdf/2306.13781.pdf
- Product Information Extraction using ChatGPT, 2023, https://arxiv.org/abs/2306.14921
- End-to-End Question-Answering System Using NLP and SQuAD Dataset, 2022, https://www.analyticsvidhya.com/blog/2021/11/end-to-end-question-answering-system-using-nlp-and-squad-dataset/
- Automatic Question Answering, 2018, https://towardsdatascience.com/automatic-question-answering-ac7593432842
- SE-PQA: Personalized Community Question Answering, 2023, https://arxiv.org/pdf/2306.16261