Navigating the Present: Exploring Practical Horizons of Retrieval-Augmented Generation (RAG)

Amir Aryani
6 min readJan 25, 2024

--

Authors: Hui Yin, Amir Aryani

RAG is an AI framework enhancing pre-trained language models with up-to-date, reliable external knowledge.

As we discussed in our previous article “A Brief Introduction to Retrieval Augmented Generation (RAG)”, RAG is an artificial intelligence framework that incorporates the latest reliable external knowledge and aims to improve the quality of responses generated by pre-trained language models (PLM). Initially, it was designed to improve the performance of knowledge-intensive NLP tasks (Lewis et al., 2020). As more and more people notice the advantages of RAG, it has also been applied to a broader range of scenarios. We conducted a rapid review of RAG-related publications and found there are mainly five categories of use cases: Medical, Knowledge Graph, Code Comment, Chatbot, Review, and Image processing.

Created by Dr Hui Yin, Jan 2024. CC-BY License for reuse.

Chatbot

Chatbots have gained widespread recognition and use by the public. It acts as an intermediary, translating user queries or requests written in natural language and communicating with the underlying LLM to generate responses.

The researchers incorporated RAG into the following tasks to improve the performance of the chatbot, allowing it to provide more accurate responses.

Question Answering (Mao et al., 2020, Ju et al., 2022, Shamane et al., 2023) requires access to external knowledge bases, documents, or structured data to provide accurate and informative responses.

Text summarization (Cai et al., 2022, Hofstätter et al., 2023), which requires an understanding of the content and context with the assistance of external knowledge to generate a concise and informative summary of longer text.

In certain scenarios, dialogue systems can also benefit from RAG technology, allowing them to handle specific tasks more efficiently (Shuster et al., 2021, Thule et al., 2021).

Code Comment and Summarization

Code comments are crucial as they provide explanatory information about the purpose, functionality, and usage of the code, clear and informative comments help improve code understanding.

Yu et al., 2022 proposed BashExplainer which aims to automatically generate code comments for Bash scripts to enhance code comprehension, especially for developers who may struggle with understanding Bash code due to its unique characteristics.

Liu et al., 2021 introduced a novel retrieval-augmented mechanism for code summarization, aiming to bridge the language gap between source code and natural language summaries.

Image Processing

Image processing covers three topics, text-to-image, image-to-text and image synthesis.

Text-to-image is the generation of visual content (images) from textual descriptions.

Chen et al., 2022 designed Re-Image to address challenges in generating images for uncommon entities. By leveraging a multi-modal knowledge base and a retrieval step, the model significantly improves fidelity, particularly for rare entities.

Image-to-text is the reverse of text-to-image, it converts visual information(images) into textual descriptions, such as generating a caption or description.

Sarto et al., 2022 proposed a Retrieval-Augmented Transformer for Image Captioning, which combines a knowledge retriever, differentiable encoder, and kNN-augmented attention layer, and enhances caption quality by using explicit external memory.

Image synthesis is very different from the above two items. It creates new images or modifies existing images, such as generating realistic images of non-existent objects or scenes, often using generative models such as generative adversarial networks (GAN). You can notice that some medium posts contain very stunning images at the top of the post page and are labelled “Generated by AI”. So far, we know of only one work that uses RAG for image captioning. Blattmann et al. proposed a Retrieval-Augmented Diffusion Model as a semi-parametric approach for generative image synthesis.

Review

RAG enhances review generation by incorporating external knowledge, such as product specifications and customer feedback. The retrieval mechanism ensures that generated reviews are more informative, contextually relevant, and aligned with the product or service characteristics, resulting in nuanced and helpful content compared to traditional generative models. Several works were found in this research topic, such as Kim et al., 2020.

Medical

RAG technology has been widely used in Medical area, such as the Summarisation of electronic health records (EHRs) (Saba et al., 2024, Thompson et al., 2023), Evidence-based medicine (EBM) (Vaid et al., 2024). In fact, it is obvious that the data processed by the models in these works is text data related to medical content, which is still an NLP task in nature, and the external data used in this example specifically belongs to the medical field.

Knowledge Graph

Integration of Knowledge Graph in RAG is mostly done to improve the accuracy and performance of the RAG Pipelines. This is mainly a methodology rather than a use case. However, given the substantial community support, it has transformed to be an important characteristic of RAG pipelines.

The knowledge graph is a structured representation of knowledge that captures entities, their attributes, and their relationships in a graph-like format. It is structured data that is easy to retrieve. In Retrieval-Augmented Generation (RAG), leveraging information from knowledge graphs like Wikipedia can assist in generating more contextually relevant and accurate natural language text, especially for open-domain tasks. In addition, the RAG structure uses other external resources depending on the specific task, such as web scraping, retrieval of databases, existing text corpora, and external APIS. Below are some survey papers on RAG with in-depth discussions on external sources and knowledge graphs, Wei et al., 2021, Agrawal et al., 2023, Pan et al., 2023.

Acknowledgement

This research was supported by the Australian Government through the Australian Research Council’s Industrial Transformation Training Centre for Information Resilience (CIRES) project number IC200100022.

References

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W.-t., Rocktäschel, T., Riedel, S., & Kiela, D. (2020b). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. Neural Information Processing Systems 33 (NeurIPS 2020). https://arxiv.org/abs/2005.11401

Mao, Y., He, P., Liu, X., Shen, Y., Gao, J., Han, J., & Chen, W. (2020). Generation-Augmented Retrieval for Open-domain Question Answering. CoRR, abs/2009.08553. https://arxiv.org/abs/2009.08553

Ju, M., Yu, W., Zhao, T., Zhang, C., & Ye, Y. (2022). Grape: Knowledge Graph Enhanced Passage Reader for Open-domain Question Answering.https://doi.org/10.48550/arXiv.2210.02933

Shamane Siriwardhana, Rivindu Weerasekera, Elliott Wen, Tharindu Kaluarachchi, Rajib Rana, Suranga Nanayakkara; Improving the Domain Adaptation of Retrieval Augmented Generation (RAG) Models for Open Domain Question Answering. Transactions of the Association for Computational Linguistics 2023.11 1–17.https://doi.org/10.1162/tacl_a_00530

Cai, D., Wang, Y., Liu, L., & Shi, S. (2022). Recent advances in retrieval-augmented text generation. In *Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval* (pp. 3417–3419) https://doi.org/10.1145/3477495.3532682

Hofstätter, S., Chen, J., Raman, K., & Zamani, H. (2023). Fid-light: Efficient and effective retrieval-augmented text generation. In *Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval* (pp. 1437–1447) https://doi.org/10.1145/3539618.3591687

Shuster, K., Poff, S., Chen, M., Kiela, D., & Weston, J. (2021). Retrieval augmentation reduces hallucination in conversation. *arXiv preprint arXiv:2104.07567*. https://doi.org/10.48550/arXiv.2104.07567

Thulke, D., Daheim, N., Dugast, C., & Ney, H. (2021). Efficient Retrieval Augmented Generation from Unstructured Knowledge for Task-Oriented Dialog. *arXiv preprint* arXiv:2102.04643. https://doi.org/10.48550/arXiv.2102.04643

Yu, C., Yang, G., Chen, X., Liu, K., & Zhou, Y. (2022). BashExplainer: Retrieval-Augmented Bash Code Comment Generation based on Fine-tuned CodeBERT. https://doi.org/10.48550/arXiv.2206.13325

Liu, S., Chen, Y., Xie, X., Siow, J., & Liu, Y. (2021). Retrieval-Augmented Generation for Code Summarization via Hybrid GNN. https://doi.org/10.48550/arXiv.2006.05405

Chen, W., Hu, H., Saharia, C., & Cohen, W. W. (2022). Re-Imagen: Retrieval-Augmented Text-to-Image Generator.https://doi.org/10.48550/arXiv.2209.14491

Sarto, S., Cornia, M., Baraldi, L., & Cucchiara, R. (2022). Retrieval-Augmented Transformer for Image Captioning.https://doi.org/10.48550/arXiv.2207.13162

Blattmann, A., Rombach, R., Oktay, K., Müller, J., & Ommer, B. (2022). Retrieval-augmented diffusion models. *Advances in Neural Information Processing Systems, 35*, 15309–15324.

Kim, J., Choi, S., Amplayo, R. K., & Hwang, S. (2020). Retrieval-augmented controllable review generation. In Proceedings of the 28th International Conference on Computational Linguistics (pp. 2284–2295).https://aclanthology.org/2020.coling-main.207

Saba, W., Wendelken, S., & Shanahan, J. (2024). Question-Answering Based Summarization of Electronic Health Records using Retrieval Augmented Generation.https://doi.org/10.48550/arXiv.2401.01469

Vaid, A., Lampert, J., Lee, J., Sawant, A., Apakama, D., Sakhuja, A., … Nadkarni, G. (2024). Generative Large Language Models are autonomous practitioners of evidence-based medicine. https://doi.org/10.48550/arXiv.2401.02851

Thompson, W. E., Vidmar, D. M., De Freitas, J. K., Pfeifer, J. M., Fornwalt, B. K., Chen, R., … Miotto, R. (2023). Large Language Models with Retrieval-Augmented Generation for Zero-Shot Disease Phenotyping.https://doi.org/10.48550/arXiv.2312.06457

Wei, X., Wang, S., Zhang, D., Bhatia, P., & Arnold, A. (2021). Knowledge Enhanced Pretrained Language Models: A Comprehensive Survey.https://doi.org/10.48550/arXiv.2110.08455

Agrawal, G., Kumarage, T., Alghami, Z., & Liu, H. (2023). Can Knowledge Graphs Reduce Hallucinations in LLMs?: A Survey. https://doi.org/10.48550/arXiv.2311.07914

Pan, S., Luo, L., Wang, Y., Chen, C., Wang, J., & Wu, X. (2023). Unifying Large Language Models and Knowledge Graphs: A Roadmap.https://doi.org/10.48550/arXiv.2306.08302

--

--

No responses yet