Evaluating RAG (Retrieval-Augmented Generation) pipelines is crucial, but creating QA (questions and answers) samples manually can be time-consuming. Synthetic data generation can reduce developer time by 90%. In this blog, I'll guide you through generating synthetic data using the RAGAs framework in Azure OpenAI, streamlining your RAG pipeline evaluation process.
Building and deploying a RAG pipeline is just the start. The real challenge is checking its performance accurately and efficiently. Manually creating QA samples for evaluation is not only time-consuming but also inconsistent. How can we ensure our RAG pipeline works well in a real-world setting without spending endless hours on manual data labeling?
Imagine if you could reduce the time spent on data labeling by 90%. What if there was a way to generate synthetic data that provides a robust and comprehensive evaluation of your RAG pipeline? This is where the RAGAs (Retrieval-Augmented Generation Assessment) framework comes into play, changing the way we approach RAG evaluation.
Creating QA samples manually is slow and inefficient. It involves repetitive tasks that waste resources and delay your RAG system’s deployment. Moreover, manual sampling often misses the variety and complexity needed to thoroughly test the pipeline, leading to potential gaps in evaluation.
Plus, if you're working with old documents, finding experts to create accurate QA pairs can be tough. This lack of expertise can make it even harder to ensure your RAG pipeline is reliable and high-quality.
Step 1: First, make sure you have the necessary libraries imported:
Step 2: Configure your environment variables for Azure services:
Step 3: Next, load your documents. Here’s an example of loading Advancing Analytic's website's content:
Step 4: We will need three models: a generator model for generating QA pairs, an embedding model for retrieving and generating context, and a critic model for validating the generation process.
Step 5: Use the Azure OpenAI models to create a synthetic test set from the loaded documents.
The RAGAs test set generator incorporates all three models and allows you to specify the distribution of the generated dataset.
For the purpose of this blog, I have set the test size to be 5, but this can be configured accordingly.
You can set how many examples should be simple Q&A, how many should involve reasoning, and how many should be multi-context. I have set the following: simple as 50%, reasoning as 20%, multi_context as 20% and conditional as 10%.
Reasoning: Rewrite the question in a way that enhances the need for reasoning to answer it effectively.
Conditioning: Modify the question to introduce a conditional element, which adds complexity to the question.
Multi-Context: Rephrase the question in a manner that necessitates information from multiple related sections or chunks to formulate an answer.
Step 6: Once you have created a test set, load it into a DataFrame for analysis:
This DataFrame contains questions and answers about Advancing Analytics, along with some additional information. Here’s a breakdown of what each column represents:
One of the questions that I find particularly interesting involves the multi-reasoning approach. Here’s the question and answer:
Question: What services does Advancing Analytics offer in Data Governance and Security, and how do they align with partnerships like Microsoft and Databricks?
Answer: The context does not provide specific information on the services Advancing Analytics offers in Data Governance and Security, or how these services align with their partnerships with Microsoft and Databricks.
This is indeed true, as we are still in the process of gathering comprehensive information for our new website!
By using the RAGAs framework, you can save time and resources, reducing the manual effort required for data labeling by 90%.
RAGAs ensure a thorough and precise evaluation by generating varied and complex questions. This evolutionary approach, inspired by Evol-Instruct, leads to more robust assessments.
RAGAs create a diverse set of QA pairs, incorporating simple, reasoning, and multi-context questions. This ensures that your RAG pipeline is tested across a wide range of scenarios, identifying potential weaknesses and areas for improvement.
By automating the generation of synthetic data, you can save time, improve accuracy, and provide comprehensive evaluations that guide your development process. For more details on the RAGAs framework, check out the official documentation.
While there is still time required to validate the generated questions and answers, this approach significantly reduces the effort and time needed to start from scratch. This streamlined process allows us to quickly iterate and refine our models, ensuring they meet our high standards of quality and performance.
By following the steps outlined above, you can efficiently evaluate your RAG pipeline using synthetic data, saving time and ensuring a thorough assessment. Happy evaluating!