RAG Cookbook: Hybrid RAG

RAGAICookbook

Advanced methods for optimal RAG usage in chatbot applications

Introduction

In our previous articles, we covered basic RAG, retrieve-and-rerank, and validation techniques. This article introduces Hybrid RAG, which combines vector search with keyword-based search to improve retrieval accuracy and robustness.

To follow along, you should be familiar with the concepts covered in previous articles. The code changes focus on enhancing the retrieval pipeline by implementing hybrid search while maintaining the same validation framework.

All relevant code changes will be contained in a single commit in the GitHub repository.

Series Overview

This is part of the RAG Cookbook series:

  1. Introduction to RAG
  2. Retrieve and rerank RAG
  3. RAG validation (RAGProbe)
  4. Hybrid RAG (This Article)
  5. Graph RAG
  6. Multi-modal RAG
  7. Agentic RAG (Router)
  8. Agentic RAG (Multi-agent)

Table of Contents


Hybrid RAG Explained

Hybrid RAG combines dense retrieval (vector search) with sparse retrieval (keyword/BM25 search) to leverage the strengths of both approaches. This combination provides:

  1. Semantic Understanding: Vector search captures conceptual relationships
  2. Exact Matching: Keyword search catches specific terms and phrases
  3. Improved Robustness: Multiple retrieval methods reduce single-point failures

Single-method retrieval faces several limitations:

  1. Vector Search Limitations:

    • May miss exact keyword matches
    • Semantic drift in edge cases
    • Computationally intensive
  2. Keyword Search Limitations:

    • Misses semantic relationships
    • Sensitive to vocabulary mismatch
    • Limited understanding of context

Hybrid search addresses these issues by combining both approaches.


Search Components

The hybrid approach consists of three main components:

  1. Dense Retrieval:

    • Uses vector embeddings
    • Captures semantic relationships
    • Handles conceptual queries
  2. Sparse Retrieval:

    • Implements keyword matching
    • Catches exact matches
    • Handles specific terms
  3. Result Fusion:

    • Combines both result sets
    • Deduplicates matches
    • Reranks final results

Implementation

Our implementation extends the existing RAG system with hybrid search capabilities:

const vectorResults = await index.query({
  vector: queryEmbedding,
  topK: 15,
  includeMetadata: true,
});
const keywordResults = await index.query({
  vector: new Array(1536).fill(0),
  topK: 15,
  includeMetadata: true,
  filter: {
    text: { $contains: query.toLowerCase() },
  },
});

3. Result Combination

const allMatches = [...vectorResults.matches, ...keywordResults.matches];
const uniqueMatches = Array.from(
  new Map(allMatches.map((match) => [match.id, match])).values(),
);

4. System Architecture

The implementation follows these key principles:

  1. Parallel Processing

    • Concurrent vector and keyword searches
    • Efficient resource utilization
    • Optimized response times
  2. Result Fusion

    • Smart deduplication
    • Score normalization
    • Weighted combination
  3. Quality Control

    • Relevance scoring
    • Result diversity
    • Context optimization

Conclusions

Hybrid RAG significantly improves retrieval quality by combining the strengths of vector and keyword search. While this adds some complexity, the benefits in robustness and accuracy make it a valuable enhancement for production systems.

The next article will explore Graph RAG, which adds relationship-aware retrieval to our system.


References