Advanced RAG Techniques — The Corrective RAG strategy
Welcome to the third post in our series! If you haven’t already, you might want to check out the first two posts to get the most out of this one. In the first post, about Self-RAG, we have introduced the technique from the paper of Asai et al. (2023). In the second post, we have introduced the Adaptive-RAG strategy. In both those posts, the main topic being discussed was the recent advance done in the area of Generative AI, more specifically in systems with the so-called RAG Architecture (that is, generative tasks where the output to a given input - like a user query - depends on a retrieval mechanism from documents that are external to the knowledge entailed in the Large Language Model being used for content generation).
In this post, we will describe the third approach of this posts series, which is the Corrective RAG strategy (a.k.a CRAG Approach). Here goes the main motivation to the work, see Yan et al. (2024):
In classical RAG approaches, the sequence of steps for the execution is the following: Given an input user query (like a question), the query is firstly vectorized (using embedding models), and the most similar documents to the query are retrieved and used as supporting content for answer generation. Other types of RAGs use techniques to generate SQL or Cypher statements to query RDBs or Neo4j graph databases, for instance, in which case the approach is different but the consequence is the same, that is, an increase in the context to answer the user’s question.
One of the problems of classical RAG approaches is that they do not analyse, in the middle of the execution of the LLM Chain / Agent, the quality or even if the retrieved document is in fact useful for answering the question. In cases of complex questions or questions that can have similar semantics to existing supporting documents, this can lead to wrong answers. One example presented in the original CRAG paper is the question:
Who was the screenwriter for Death of a Batman?
The retriever may obtain, as the most similar document to use, the following phrase: “Batman was filmed in 1989. When Hamm’s script was rewritten,…”, and the LLM may use the information “Hamm” as the inferred answer, which would be a wrong answer in this case.
The goal of CRAG is to address the question above. Basically, it addresses the cases where the retriever returns inaccurate results. CRAG self-corrects the results of a retriever and improves the utilization of documents for augmenting generation. The approach also aims at increasing the generated answer by refactoring / rewriting the retrieved correct documents, which sometimes may contain too much information (more than the needed info for a correct answer). Thus, the approach can also help in generating more objective answers. Let’s discuss the architecture of the approach in the next section.
The CRAG Architecture and Strategy
As we did in a previous post, let us firstly remind how RAG systems can be described in simple mathematical terms. Given an input / user query q, and taking into account a group of documents C = {d1,d2,d3,…,dN}, we would like to generated the output / answer, a. The retriever R aims to retrieve the top-K documents D (where D is a subgroup of C). Based on q and D, we want the generator G, to generate the output / answer. Representing this as one simple statement, we have (see Yan et al. 2024), the following:
P(a | q) = P(D | q) P (a , D | q)
Note that, no matter how advanced the generator training process was, it cannot generate correct answers if D does not contain relevant information based on the query.
Now let us discuss the components of CRAG that make it unique and an improvement over classical RAG techniques. The contents presented here are based on Section 4 of the original paper from Yan et al. The Figure 1 used below also comes from the Paper.
The sequence of steps is the following:
- Given a user query, perform an initial retrieval and obtain the documents d1 and d2.
- Use a lightweight retrieval evaluator to estimate the relevance score of retrieved documents to the input query (the authors used a T5-large model for this purpose).
- The relevance score is quantified into a total of three confidence degrees and then triggered the corresponding actions: Correct, Incorrect and Ambiguous
- Depending on one of the three actions, the flow is directed to the corresponding procedures in Figure 1.
- In case the answer is Correct, the knowledge is refined by cutting parts of the retrieved documents to only use the most relevant parts.
- In case the answer is Ambiguous, additional search is done and the few relevant documents from the initial group are analyzed with the new documents retrieved from the web search algorithm.
- In case the answer is Incorrect, the documents initially retrieved will be discarded and only the documents retrieved from the web search algorithm will be used as context for the final answer.
It is important to mention that, in the sequence of actions above, the CRAG implementation counts on a back-up strategy for knowledge retrieval based on web search tools. The web search part is only triggered IF the output from the relevance score step is Incorrect or Ambiguous. Correct scenarios do not need this step, since we assume that, in that case, all the answer can be generated only with the initial group of retrieved documents d1 and d2. It is worth mentioning that other tools, not only web search, could have been employed. We are just presenting the original architecture from Yan et al. (2024).
Additional Considerations
The architecture of CRAG makes it an awesome feature to be implemented in classical RAG systems. An interesting aspect is that the solution is plug-and-play, and can thus be quickly implemented in an existing RAG system by adding 1) the retrieval evaluator, 2) a strategy that can be considered as containing the Source of Truth in case the leading source of documents provides incorrect answers (in this post, this strategy was the web search tool), and 3) a decompose-then-recompose algorithm for selectively focusing on key information and filter ou irrelevant information in them.
One final remark we make is how Yan et al. (2024) decided to constrain what actually is an Ambiguous or Incorrect answer. Incorrect scenarios occur when all retrieved documents are below the lower threshold value of the retrieval evaluator result. Ambiguous scenarios occur when none of the retrieved documents present confidence score above the upper threshold, but at the same time, not all the retrieved documents are below the lower threshold. Finally, the Correct scenarios occur when at least one retrieved document score is higher than the upper threshold. For more discussions, see the Section 4.3 of the original paper.
References
Corrective Retrieval Augmented Generation. arXiv 2401.15884 [link here]