Empowering RegReview’s Chatbot Experience: Time-Driven and Source-Associated RAG Improvements

8 min readJun 26, 2024

Introducing RegAssist: Your Chatbot Companion for Regulatory Vigilance

Given the continuous influx of updates, the task of navigating through regulatory information can be daunting. Users of RegReview must adeptly manage large volumes of documents and stay updated on regulatory changes. In this context, RegAssist, the chatbot component of the RegReview product, becomes an intuitive and essential tool for regulatory vigilance. By enabling users to ask specific questions amidst the vast document pool, RegAssist simplifies document analysis and enhances the identification of pertinent information to facilitate regulatory watch.

To achieve this, RegAssist is composed of a retriever and a generative part. The purpose of a retriever is to efficiently locate and retrieve relevant information from a large database or corpus in response to user queries. The generative part uses the retrieved documents to formulate a relevant answer for the user thanks to generative AI. Retrievers are therefore a significant area of improvement for chatbots, as they play a crucial role in enhancing the accuracy and efficiency of information retrieval processes.

In this article, we will explain how we built the chatbot to effectively assist with regulatory compliance and document management. We will first discuss the challenges of specializing RAG for regulatory vigilance. Then, we will explore solutions such as time sensitivity, source association, and query reformulation. We will also outline the general architecture of RAG and conclude with our achievements using the specialized RAG.

Specializing RAG for Regulatory Vigilance: challenges

Our first extractor used a simple and conventional approach, focusing primarily on similarity calculations between user queries and documents (similarity search) that are divided in chunks (paragraphs). The initial process extracted the k most similar chunks from the Pinecone vector store where all documents are stored as illustrated in Figure 1. The k most pertinent document fragments are sent into the generative part where the LLMs (Large Language Models) can formulate an answer to user query based on these chunks

However, it’s essential to point out one limitation. When users ask about actions within specific date ranges and the documents don’t mention publication dates, both the retriever and LLM can’t determine the dates without accessing the database metadata. The previous retriever did not take document metadata into account such as source, title or date, as it was not accessible.

Here is an example that highlights our challenge on the RegReview platform in figure 2. We asked RegAssist on a large set of documents “What decisions were made about data protection during the last three months?”

Figure 2 — RegAssist performances with the classic RAG on a specific period

As you can see, with the classic retriever, all formulated response elements were relevant; however, the temporal aspect of the query was not considered. Indeed, all cited documents were published in September 2023 and January 2024, which falls outside the period specified by the user.

Here are the process and the techniques implemented to both make the retriever time-sensitive and customize it to adapt to our needs.

Solutions: Time Sensitivity, Source Association, and Query Reformulation

The main idea is to combine semantic similarity search using Pinecone’s similarity calculation with the ability to adjust this ranking based on the retrieved document parameters stored in ElasticSearch according to the user’s query: publication date and source of documents. Therefore, we have developed two approaches to tailor our retriever specifically for regulatory monitoring:

Making the Retriever Time-Aware

In our quest to enhance the temporal awareness of our Retrieval-Augmented Generation (RAG) system, we implemented a two-step approach. Firstly, through meticulous prompt engineering, we ensured that our LLM could accurately infer temporal parameters within user queries, format them appropriately, and comprehend their contextual time limit. Thus, the output of the LLM is composed of the time element, the corresponding formatted date and the ‘context’. The latter is the operator that will be used in the future ElasticSearch query and should be equal to ‘lt’, ‘lte’ (less than or equal to), ‘gt’, ‘gte’ (greater than or equal to), ‘eq’ (equal to). If none of the elements is found from the query, the LLM returns ‘None’ or ‘False’ for each of them as illustrated in Figure 3. By fine-tuning the prompt, we mitigated the risk of hallucinations and encouraged the LLM to focus on discerning temporal nuances effectively.

Figure 3 — Time formatting chain diagram

Subsequently, leveraging the outputs of the LLM, we dynamically adjust our ElasticSearch queries, incorporating time-based boosting techniques. If a precise date or period is identified from the query, the corresponding documents are boosted by 15. Otherwise, the most recent ones are boosted with the boosting score indicated in the diagram (since we usually need the most recent articles when performing regulatory watch). By prioritizing search results based on the temporal markers identified by the LLM, our system ensures that retrieved information aligns with the relevant time frames indicated in user queries, thus significantly enhancing the temporal awareness of our retrieval process.

Needing a Query Reformulation

However, we encountered a significant challenge concerning the inclusion of temporal parameters within user queries. Specifically, we found that when temporal references were absent or ambiguous in the text, response formulation LLMs exhibited strict behavior, resulting in either an inability to find relevant answers or, in some cases, generating inaccurate responses by inventing article ids. We can illustrate this in the following example for the query “What decisions were made by the European Data Protection Committee during the last three months?”.

{
  "question_id": "0e936931-474f-46bc-acbc-5a331a1ae847",
  "response": [
    {
      "elastic_ids": [
        "e8d9b2c4b3b3b3c4b3b3b3b" HALLUCINATIONS
      ],
      "text": "The European Data Protection Committee made decisions related to data protection within the last three months."
    },
    {
      "elastic_ids": [
        "e8d9b2c4b3b3b3c4b3b3b3b" HALLUCINATIONS
      ],
      "text": "The decisions made by the European Data Protection Committee were not specified in the documents."
    }
  ]
}

We observe that the LLM invented the ElasticSearch ids. The LLM cannot verify this time range from the text content, so it assumes it cannot provide an accurate response. Consequently, the LLM generates fictitious article IDs, leading to unreliable and misleading results.

Recognizing the limitations imposed by this strictness, we identified the need for query reformulation, particularly when temporal parameters were in a strict time range. By reformulating queries to remove temporal constraints, we aimed to mitigate the risk of hallucinations or erroneous responses generated by the LLMs.

This approach not only improved the accuracy of our retrieval system but also enhanced the overall user experience by ensuring that relevant information was effectively surfaced, regardless of temporal specificity.

Figure 4 — Query transformation chain diagram

Prioritizing Relevant Sources

As we also want our retrieval process to be specialized in regulatory monitoring, we have implemented a pragmatic approach centered around leveraging a Large Language Model (LLM) to discern the most pertinent sources available on our platform.

Equipped with a comprehensive list of sources accessible within our platform, the LLM analyzes each source to determine its relevance to user queries. These insights are then utilized to prioritize the presentation of sources within our search results, ensuring that users are directed towards the most relevant and reliable information sources. Through this data-driven methodology, we aim to enhance the user experience by facilitating seamless access to pertinent content tailored to their needs.

Figure 5 — Source association chain diagram

The source association chain can return the list of the most accurate sources of the platform based on the user query, shown in Figure 4.

RAG General Architecture

We can now present the general architecture of the retriever, as illustrated in Figure 6. When a user asks a question to the RegAssist chatbot, it is processed simultaneously in two distinct processes.

Firstly, semantic search is performed in the Pinecone vector store. A similarity calculation between document chunks and the embedded question is conducted to extract the k most similar chunks. The value of k is deliberately set to a high number to cover a wide variety of relevant documents.

Secondly, information is extracted from the question to construct a query for Elasticsearch thanks to the “general retrieval chain”. The chain is composed of three different LLMs running in parallel but aiming at different purposes: time formatting, source association and query reformulation as explained before. Notably, elements extracted during question reformulation are boosted in the Elasticsearch search.

The ranking process involves refining the initial selection of articles based on their similarity to the user’s query, utilizing both Pinecone and Elasticsearch. Pinecone provides a measure of semantic similarity between the user’s question and the articles in the database, while Elasticsearch offers additional context and relevance based on various factors such as keyword matching and metadata.

To establish the final ranking, a weighted combination of the Pinecone similarity score and the Elasticsearch score is calculated (hybrid search) according to the following formula:

final ranking = α x pinecone ranking + β x elastic boosting

where α pinecone weight equals 0.6 and β elastic weight equals 0.4

Balancing semantic relevance and timeliness is crucial. This weighting process considers both semantic relevance and timeliness. Since Pinecone emphasizes semantic understanding, it gets a higher weight in the final score, ensuring articles aligned with the user’s query are prioritized; while considering Elasticsearch’s timeliness, delivering results that prioritize both relevance and recency.

Finally, the reformulated question and the relevant chunks are sent to the LLMs to generate a response.

Figure 6 — General retriever architecture diagram

Achievements with our specialized RAG

Now, with the introduction of the specialized RAG, depicted in Figure 7–8, a notable transformation is evident compared to the former performances depicted in Figure 2. All the articles cited to answer the question have indeed been published within the last three months, some even within the last month. This stark contrast highlights the significant improvement achieved by incorporating the enhanced RAG into our system.

Figure 7–8 — RegAssist performances with the brand-new RAG on a specific period

Consequently, the user has access to the most recent response elements, enabling more reliable and effective monitoring. The process undertaken to enhance the retriever’s temporal sensitivity and tailor it to meet our specific requirements involved a meticulous approach.

Conclusion

Through the integration of our specialized RAG system, we have effectively addressed critical challenges, particularly regarding temporal sensitivity and source relevance. By leveraging these advancements, RegAssist empowers users to seamlessly navigate through intricate regulatory landscapes, extracting pertinent insights with ease. The system’s ability to deliver up-to-date information positions swiftly and accurately is a cornerstone for informed decision-making and proactive compliance efforts.

@Océane Wauquier @Maxime Charpentier