RAGBuilder: Creating Optimal Production ready RAG Setups

Written by Toyosi Babayeju | Dec 10, 2024 4:23:12 PM

Background

In recent years, Retrieval-Augmented Generation (RAG) has emerged as a transformative force in natural language processing (NLP), especially in applications like Q&A and summarisation. By merging the robust capabilities of information retrieval with the generative prowess of large language models (LLMs), RAG significantly elevates the quality and relevance of generated responses. Yet, despite its immense potential, setting-up and optimising a RAG pipeline can be a complex, resource-heavy endeavour, as it involves managing a variety of parameters and components manually.

Enter RAGBuilder—a powerful tool that automates the complex configuration of Retrieval-Augmentation-Generation (RAG) pipelines, from key setup parameters to evaluation metrics, and even logs configuration changes for easy comparison. RAGBuilder enables you to create more efficient, scalable, and effective RAG systems without the need for intensive manual setup.

In this post, we’ll explore how RAGBuilder transforms the development experience, allowing you to "Build Production-Grade RAG in Minutes." Much like how AutoML simplified the creation of machine learning models, RAGBuilder is revolutionising the process for RAG pipelines, making high-performance setups accessible to all.

What is RAG?

RAG stands for Retrieval Augmented Generation, a technique used within natural language processing (NLP) that combines large language models with information retrieval systems to enhance the accuracy and relevance of a generated response, particularly for data you would like to pass through a LLM (Large Language Model).

There are essentially two components to RAG: A Retriever and a Generator. The Retriever searches a pre-existing knowledge base or ingested document corpus for relevant content, while the generator uses retrieved information to produce coherent, contextually relevant text.

Understanding the RAG Pipeline?

RAG is a sequence that combines document retrieval with a Large Language Model (LLM) to generate highly relevant, context-aware responses. Let’s go through each stage of the process and see how a sample user query is handled.

1. Retrieval

When a user submits a query, the system first retrieves relevant documents or data from a database. This database is usually represented as a vector store, meaning each document is encoded into a numerical format (vector) based on its content. The retrieval process leverages similarity matching to find vectors (documents) that are most relevant to the query vector.

Example Part-1:

User Query: "What is the current return policy?"

Retrieval Output: The system retrieves the latest return policy document from the company’s knowledge base using vector similarity search.

2. Augmentation

In the RAG pipeline, the retrieved document (or documents) is combined with the user’s input query to "augment" the information supplied to the LLM. This augmented input provides the model with up-to-date and specific context that it wouldn't have access to on its own, allowing it to generate more precise answers based on both its trained knowledge and the latest retrieved data.

Example Part-2:

User Query + Retrieved Document: The return policy document is now paired with the user's question about return policies, providing the LLM with focused context to work with.

3. Generation

With both the query and relevant context, the LLM generates a response. This response is informed by both the model’s pre-existing knowledge and the recent, document-specific information provided by the retrieval step. The LLM, therefore, can produce a tailored answer that addresses the user's query accurately and contextually.

Example Part 3:

Generated Response: "Our return policy allows returns within 30 days of purchase, provided the item is unused and in its original packaging."

Difficulties of creating the RAG pipeline in Production

The difficulties with RAG is that there are several moving parts, from data ingestion, retrieval, re-ranking, generation and more. Each part comes with numerous options that can be difficult to manage and difficult to determine the best possible combinations.

Lets provide a simple example of possible combinations:

Chunking Methods	Chunk Size	Embedding Models	Retrievers	re-rankers	Prompts	LLMS
Chracter Splitter	128	Static Embeddings	Sparse Retrievers	Traditional Re-Rankers	Prompt 1	GPT-4o
Recursive Character Splitter	256	Contextual Embeddings	Dense Retrievers	Neural Re-Rankers	Prompt 2	Claude
Sentence Splitter	512	GPT-BAsed embeddings	Hybrid Retrivers	Cross-Encoder	Prompt 3	LLaMA

Just based off the examples above, if we were to reconfigure for every possible combination we already have 2,187 possible configurations. If we tried to evaluate each one in just 10 minutes, that would still take us 15 days of continuous trial and error. Imagine if we added more options we can now see how far this can scale and how impossible of a task it is to find your optimal RAG setup manually.

Enter RAGBuilder

This is a tool that is designed to help us automatically create optimal, production-ready RAG setup for our data. RAGBuilder conducts hyperparameter tuning in various RAG parameters, such as chunk sizes, chunking strategies, embedding models and more, evaluating these configurations against a test dataset to identify the best-performing setup tailored to your data. RAGBuilder offers an optimisation approach that allows you to run multiple configurations automatically, either using Bayesian optimiser or a Run all combinations which can be resource intensive.

Below is an illustration of how the RAGBuilder is used to optimise our RAG automatically without the need to manually try and test each configuration.

Getting Started: RAGBuilder Configuration and Performance Dashboard

RAGBuilder offers a user-friendly interface that lets you set up and configure Retrieval-Augmentation-Generation (RAG) pipelines with just a few clicks. The setup process is intuitive, with a step-by-step guide available to walk you through each stage of the configuration. To get started go to the RAGBuilder repo as this will give you a step by step method of setting up your RAGBuilder depending on your system.

Source Data: This could be a URL, local directory or local file path. For the sake of our demo lets use a URL: https://lilianweng.github.io/posts/2023-06-23-agent. Below is the review of the selections made during configuration process.

The RAGBuilder Performance Dashboard serves as your command centre for monitoring and analysing the performance of different RAG configurations. This powerful interface provides comprehensive metrics that help you identify the most effective setup for your document AI system.

RAGBuilder stores all documents, results, and analyses locally, ensuring quick and easy access. Historical runs are conveniently available in RAGBuilder’s project dashboard, allowing you to compare previous configurations on the same dataset effortlessly. This feature makes it easy to evaluate different setups and track performance improvements over time, ensuring you’re always optimising for the best results.

Key Components of the Dashboard

Note that these scores are from the example

1. Performance metrics (RAGAS)

Answer Correctness: Consistently showing scores of 1.0, indicating high accuracy
Context Precision: Ranging from 0.88 to 0.91, showing strong relevance
Context Recall: Varying between 0.71 to 0.80, measuring comprehensive information retrieval
Token Usage: Averaging around 1840-1850 tokens per query
Cost Efficiency: Costs per 1K queries hovering around $9.70-$9.90 (Relatively similar cost because we are using the same Azure OpenAI LLM for all calls)
Latency: Response times varying from 1.12s to 8.18s

2. RAG Config

Framework settings (using Langchain)
Chunking parameters
VectorDB configurations
Embedding models
Retriever specifications

3. Interactive Features

View Details button for in-depth analysis
View Code Snippet for examining the exact implementation
Chat option for real-time discussion of results
Search and filter capabilities for easy navigation

4. Temporal Analysis

The timestamp column allows for tracking performance improvements over time, with runs documented from October 25, 2024, showing the evolution of configurations and their respective performance metrics.

The Performance Dashboard exemplifies RAGBuilder's commitment to transparency and measurable performance, making it an invaluable tool for teams looking to deploy and optimise their RAG systems in production environments. After identifying the optimal configuration, RAGBuilder provides ready-to-implement code snippets that can be easily integrated into your development environment. Additionally, RAGBuilder supports GraphRAG implementations, enabling direct performance comparisons between traditional RAG and graph-based approaches which are becoming more prominent.

Limitations

While RAGBuilder offers powerful features for optimising Retrieval-Augmented Generation (RAG) pipelines, there are some potential downfalls and limitations to be aware of:

Complexity of Customisations: Bespoke use cases may still require manual configuration adjustments beyond RAGBuilder's default options. Currently, RAGBuilder is designed to work specifically with LangChain's framework, which may limit integration options for teams using different RAG frameworks.

Limited to predefined Pipelines: When integrating with new techniques or non-standard data sources e.g. cloud, this can be difficult to utilise.

Performance on Large Scale data: For extremely large datasets, performance optimisation might hit bottlenecks that are not easily resolved by the automated configuration suggestions. The tool may not scale well for use cases requiring real-time performance under heavy loads without manual intervention.

Cost and Resource Overheads: While RAGBuilder may streamline certain processes, the computational costs associated with testing and validating multiple configurations can be resource-intensive, especially in environments where processing power is limited.

Conclusion

RAGBuilder revolutionises the development and optimisation of Retrieval-Augmented Generation pipelines by automating the complex, manual processes of configuration, testing, and evaluation.

Its ability to run multiple configurations and select the most effective setup for a given dataset significantly reduces the time, effort, and expertise required to build production-ready RAG systems. However there are limitations such as handling large-scale data, bespoke customisations and specific performance bottlenecks. Despite these challenges, RAGBuilder remains a valuable tool for teams looking to accelerate the deployment of efficient, high-performance RAG pipelines tailored to each unique cases.

Want to optimise your RAG implementation? Chat with our experts to explore how RAGBuilder can fast-track your pipeline and tailor solutions for your needs.

View full post