update literature review

author: Aditya <bluenerd@protonmail.com> 2025-02-18 16:23:32 +0530
committer: Aditya <bluenerd@protonmail.com> 2025-02-18 16:23:32 +0530
commit: 6df8815ee30d9edd4e7e3c54fa00633ca3d4963a (patch)
tree: 81be2fbebd5f6794e96667063665ee67bfc1dd26
parent: cd77837ecf6f69e3c674b80898b7970195748834 (diff)
1 files changed, 25 insertions, 0 deletions
diff --git a/literature-review.md b/literature-review.md
index 37ab8b1..30f0b62 100644
--- a/literature-review.md
+++ b/literature-review.md
@@ -28,4 +28,29 @@ This literature review examines the advancements in Information Retrieval (IR) t
 # Evaluating Retrieval Quality in Retrieval-Augmented Generation
 The paper introduces eRAG, a novel evaluation method that utilizes the large language model (LLM) within Retrieval-Augmented Generation (RAG) systems to generate document-level relevance labels based on downstream task performance, demonstrating a marked improvement in correlating retrieval quality with downstream performance, as evidenced by enhancements in Kendall’s tau correlation ranging from 0.168 to 0.494 (Salemi & Zamani, 2024). eRAG significantly outperforms traditional evaluation methods, such as human judgment and KILT Provenance, which often yield low correlation with actual RAG performance and are limited by cost and practicality (Zamani & Bendersky, 2022; Petroni et al., 2021). Furthermore, eRAG exhibits remarkable computational efficiency, consuming up to 50 times less memory and providing an average speedup of 2.468 times compared to end-to-end evaluation methods, thereby facilitating quicker iterations in model development and evaluation (Lewis et al., 2020). This study highlights the limitations of conventional evaluation approaches, which often lack transparency and fail to provide a comprehensive understanding of retrieval quality, complicating the optimization of retrieval models (Agrawal et al., 2023; Shuster et al., 2021).
 
+# Fine Tuning vs Retrieval Augmented Generation for Less Popular Knowledge
+Fine Tuning (FT) involves adjusting model weights to enhance the recall of specific information relevant to a domain, making it particularly useful when domain-specific data is scarce. While FT has been shown to improve the performance of language models (LMs), especially smaller ones, it requires significant computational resources and training data. Techniques like Parameter Efficient Fine-Tuning (PEFT), such as QLoRA, help maintain reasoning capabilities while integrating new knowledge, with the quality of synthetic training data playing a crucial role in effectiveness. In contrast, Retrieval Augmented Generation (RAG) combines retrieval mechanisms with generative models, allowing LMs to dynamically access external knowledge bases, and has consistently outperformed FT, particularly for less popular knowledge. The success of RAG is dependent on the effectiveness of the retrieval models used, with advanced techniques enhancing performance. Comparative studies indicate that RAG achieves higher accuracy for low-frequency entities, while FT can still provide improvements in certain contexts; however, combining FT and RAG yields the best results for smaller models, whereas larger models may suffer from performance degradation due to potential reasoning capability loss during fine-tuning.
 
+# How Much Knowledge Can You Pack Into the Parameters of a Language Model
+Roberts et al. (2020) utilized a fine-tuning methodology on three open-domain question answering datasets—Natural Questions, WebQuestions, and TriviaQA—contrasting closed-book question answering, which relies solely on internalized knowledge, with traditional open-book systems that access external knowledge sources. They introduced salient span masking (SSM) as a pre-training objective, positing that it would improve the model's information retrieval capabilities. The experimental results demonstrated that larger models consistently outperformed smaller ones across all datasets, with the integration of SSM during pre-training leading to significant performance enhancements, highlighting the critical role of task-specific pre-training objectives in optimizing knowledge retrieval.
+
+# Learning Transferable Visual Models From Natural Language Supervision
+The paper by Radford et al. (2021) introduces CLIP (Contrastive Language-Image Pre-training), a model designed to associate images with their corresponding textual descriptions, trained on a dataset of 400 million (image, text) pairs, enabling it to learn a diverse range of visual concepts. Utilizing a contrastive learning framework, CLIP maximizes the cosine similarity between embeddings of paired images and texts while minimizing it for non-paired examples, employing a joint architecture of an image encoder (ResNet or Vision Transformer) and a text encoder (Transformer). The results demonstrate that CLIP achieves zero-shot performance that is competitive with fully supervised models across various benchmarks, including ImageNet, and shows robustness to natural distribution shifts, indicating its potential for real-world applications. The authors provide a thorough analysis of CLIP's performance, emphasizing its strengths and identifying areas for further improvement (Radford et al., 2021; Brown et al., 2020; Deng et al., 2009).
+
+# Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering
+Generative models like T5 and BART have shown competitive performance in Open Domain Question Answering (ODQA) by generating answers from input questions and retrieved passages, with Roberts et al. (2020) introducing a model that operates without external knowledge, paving the way for further research. Passage retrieval is essential in ODQA, involving the extraction of relevant text passages from knowledge bases such as Wikipedia, utilizing traditional sparse representations like TF-IDF and more recent dense representations through Dense Passage Retrieval (DPR), which enhance retrieval accuracy. The Fusion-in-Decoder approach by Izacard and Grave (2021) effectively combines generative models with passage retrieval by independently processing multiple passages in the encoder and aggregating evidence in the decoder, thus improving the model's answer generation capabilities. The method has achieved state-of-the-art results on benchmarks like Natural Questions and TriviaQA, with performance metrics such as Exact Match (EM) scores demonstrating significant improvements as the number of retrieved passages increases, providing a solid framework for evaluating model accuracy.
+
+# Precise Zero-Shot Dense Retrieval without Relevance Labels
+Gao et al. (2023) introduce HyDE, a two-step retrieval process that employs instruction-following language models like InstructGPT to generate hypothetical documents based on user queries, which encapsulate relevance patterns for retrieving actual documents from a corpus. The methodology consists of generating these hypothetical documents in the first step and encoding them using unsupervised contrastive learning methods, such as Contriever, in the second step to filter out irrelevant content and facilitate the retrieval of real documents. This innovative approach enables effective retrieval without the need for relevance labels, making it suitable for various tasks, including web search, question answering, and fact verification. Experimental results demonstrate that HyDE significantly outperforms existing unsupervised dense retrieval models and shows competitive performance against fine-tuned models across multiple tasks and languages, underscoring its potential as a robust solution for zero-shot retrieval scenarios.
+
+# Re2G Retrieve, Rerank, Generate
+Recent advancements in retrieval-augmented models, such as RAG (Retrieval-Augmented Generation) and REALM (Retrieval-Augmented Language Model), have highlighted the effectiveness of integrating retrieval mechanisms into generative frameworks, significantly enhancing the knowledge accessible to these models through the use of indexed corpora (Lewis et al., 2020; Guu et al., 2020). Building on this foundation, Re2G introduces key innovations, including a reranking mechanism that integrates retrieval results from various sources, such as BM25 and neural retrieval methods, thereby improving the selection of relevant passages for generation. Additionally, Re2G employs a novel variation of knowledge distillation for end-to-end training of its initial retrieval, reranker, and generation components, utilizing only the ground truth of the target sequence output, which facilitates enhanced performance across diverse tasks.
+
+# REALM Retrieval-Augmented Language Model Pre-Training
+REALM employs a two-step methodology consisting of retrieval and prediction, where it first retrieves relevant documents from a knowledge corpus based on the input query and then uses these documents to inform its predictions. This retrieval mechanism is trained unsupervised, utilizing masked language modeling to optimize the process through backpropagation, which enhances prediction accuracy. Experimental evaluations on Open-QA benchmarks, such as Natural Questions and Web Questions, demonstrate that REALM significantly outperforms existing state-of-the-art models, achieving 4-16% improvements in absolute accuracy while also offering qualitative benefits like enhanced interpretability and modularity (Guu et al., 2020). Compared to other retrieval-based and generation-based systems, REALM shows superior performance, even surpassing larger models like T5, highlighting the importance of its retrieval mechanism in accurately answering questions by providing relevant context (Devlin et al., 2018; Raffel et al., 2019).
+
+# REST Retrieval-Based Speculative Decoding
+Speculative decoding traditionally relies on a smaller language model to generate draft tokens, which are then verified by a larger model; however, obtaining a high-quality draft model that balances size and predictive power often requires custom training (Miao et al., 2023; Chen et al., 2023). The Retrieval-Based Speculative Decoding (REST) framework addresses these challenges by utilizing a non-parametric retrieval datastore to construct draft tokens, allowing for seamless integration with various large language models (LLMs) without additional training (He et al., 2024). Unlike LLMA, which retrieves from limited contexts, REST draws from a comprehensive datastore, enabling a broader range of information during generation. Extensive experiments show that REST achieves significant speedups in token generation, with improvements ranging from 1.62x to 2.36x compared to standard autoregressive and speculative decoding methods, demonstrating its effectiveness across diverse datasets such as HumanEval and MT-Bench (He et al., 2024).
+
+# Retrieval Augmentation Reduces Hallucination in Conversation
+State-of-the-art dialogue models often generate responses that lack factual accuracy, resulting in hallucination, a problem exacerbated by their reliance on internal knowledge that may not cover all relevant information (Roller et al., 2021; Maynez et al., 2020). Retrieval-Augmented Generation (RAG) addresses this issue by integrating neural retrieval mechanisms with generative models, allowing for the retrieval of relevant documents from a large corpus to enhance the factual accuracy of responses (Lewis et al., 2020b). Studies have shown that models employing retrieval mechanisms achieve state-of-the-art performance on knowledge-grounded conversational tasks, significantly reducing hallucination rates (Shuster et al., 2021). Human evaluations further reveal that retrieval-augmented models demonstrate higher knowledgeability and lower hallucination rates compared to standard models, while also exhibiting improved generalization to unseen topics, thereby outperforming models that rely solely on internal knowledge (Dinan et al., 2019b; Zhou et al., 2021).
author	Aditya <bluenerd@protonmail.com>	2025-02-18 16:23:32 +0530
committer	Aditya <bluenerd@protonmail.com>	2025-02-18 16:23:32 +0530
commit	6df8815ee30d9edd4e7e3c54fa00633ca3d4963a (patch)
tree	81be2fbebd5f6794e96667063665ee67bfc1dd26
parent	cd77837ecf6f69e3c674b80898b7970195748834 (diff)