I Built a RAG System in Laravel in a Weekend — Here’s Exactly How It Works

Vector embeddings, pgvector in PostgreSQL, chunking strategies, semantic search, context injection, and the Laravel AI SDK holding it all together — a complete walkthrough of building a document Q&A system that actually gives useful answers.

The moment Laravel 13 shipped in March 2026 with a first-party AI SDK, RAG (Retrieval-Augmented Generation) stopped being a Python problem. Native pgvector support in the query builder. A unified provider-agnostic API for embeddings and text generation. SimilaritySearch as a built-in agent tool. The infrastructure that used to require Pinecone, a Python microservice, and a LangChain integration is now Laravel-native.

I built a complete document Q&A system in a weekend. You upload PDFs or text documents, ask questions in plain English, and get answers with citations — backed by your own documents, not the LLM’s training data.

This post is the complete walkthrough. Every piece of the pipeline, every design decision, every trade-off.

What RAG Actually Is

Before the code, the concept — because “RAG” gets used to mean everything from a simple semantic search to a full agent pipeline.

The core problem RAG solves: LLMs have a knowledge cutoff and don’t know about your specific documents, your internal wikis, your product manuals, your codebase. Fine-tuning a model on your data is expensive and slow. RAG is the practical alternative.

User question: "What's the refund policy for enterprise customers?"

WITHOUT RAG:
User → LLM → "I don't have access to your specific policies."
              (or worse: a confident hallucination)

WITH RAG:
User → Embed the question → Vector search for similar chunks
     → Find the relevant policy sections in your documents
     → Inject those sections into the LLM prompt as context
     → LLM answers based on YOUR actual documents
     → Answer with source citations

The quality of a RAG system depends on three things:

Chunking — how you split documents into retrievable pieces
Retrieval — how you find the right chunks for a given question
Generation — how you use the retrieved chunks to answer the question

The Stack

Laravel 13 with the laravel/ai SDK
PostgreSQL with the pgvector extension
OpenAI text-embedding-3-small for embeddings (1,536 dimensions)
OpenAI gpt-4o for generation (or any provider — the SDK is agnostic)
Parsel (for PDF text extraction, from our earlier post)
Laravel Horizon for processing documents in the background

Installation and Setup

composer require laravel/ai
php artisan vendor:publish --provider="Laravel\Ai\AiServiceProvider"
php artisan migrate

# .env
AI_DEFAULT_PROVIDER=openai
OPENAI_API_KEY=sk-your-key-here

# PostgreSQL with pgvector is required
DB_CONNECTION=pgsql
DB_HOST=127.0.0.1
DB_PORT=5432
DB_DATABASE=your_database
DB_USERNAME=your_user
DB_PASSWORD=your_password

# Install pgvector extension on PostgreSQL
# Ubuntu/Debian:
apt install postgresql-16-pgvector

# macOS:
brew install pgvector

# Enable in your database:
psql -d your_database -c "CREATE EXTENSION IF NOT EXISTS vector;"

The Database Schema

Two tables: one for source documents, one for the chunks derived from them.

// database/migrations/create_rag_tables.php
<?php

use Illuminate\Database\Migrations\Migration;
use Illuminate\Database\Schema\Blueprint;
use Illuminate\Support\Facades\Schema;

return new class extends Migration
{
    public function up(): void
    {
        // Ensure pgvector extension exists
        Schema::ensureVectorExtensionExists();

        // Source documents
        Schema::create('documents', function (Blueprint $table) {
            $table->id();
            $table->string('title');
            $table->string('filename');
            $table->string('mime_type');
            $table->unsignedBigInteger('file_size');
            $table->enum('status', ['pending', 'processing', 'ready', 'failed'])
                  ->default('pending');
            $table->unsignedInteger('chunk_count')->default(0);
            $table->text('error_message')->nullable();
            $table->foreignId('user_id')->constrained()->cascadeOnDelete();
            $table->timestamps();
        });

        // Document chunks with vector embeddings
        Schema::create('document_chunks', function (Blueprint $table) {
            $table->id();
            $table->foreignId('document_id')->constrained()->cascadeOnDelete();

            // The text content of this chunk
            $table->text('content');

            // Metadata for filtering and citation
            $table->unsignedInteger('chunk_index');       // position within document
            $table->unsignedInteger('token_count');        // approximate token count
            $table->string('section_title')->nullable();   // heading above this chunk
            $table->unsignedInteger('page_number')->nullable(); // for PDFs

            // The vector embedding — 1536 dimensions for text-embedding-3-small
            $table->vector('embedding', dimensions: 1536)->index();
            // ->index() creates an HNSW index with cosine distance automatically

            $table->timestamps();
        });
    }
};

The Eloquent Models

// app/Models/Document.php
<?php

namespace App\Models;

use Illuminate\Database\Eloquent\Model;
use Illuminate\Database\Eloquent\Relations\HasMany;

class Document extends Model
{
    protected $fillable = [
        'title', 'filename', 'mime_type', 'file_size',
        'status', 'chunk_count', 'error_message', 'user_id',
    ];

    public function chunks(): HasMany
    {
        return $this->hasMany(DocumentChunk::class);
    }

    public function markReady(): void
    {
        $this->update([
            'status'      => 'ready',
            'chunk_count' => $this->chunks()->count(),
        ]);
    }

    public function markFailed(string $error): void
    {
        $this->update([
            'status'        => 'failed',
            'error_message' => $error,
        ]);
    }
}

// app/Models/DocumentChunk.php
<?php

namespace App\Models;

use Illuminate\Database\Eloquent\Model;
use Illuminate\Database\Eloquent\Relations\BelongsTo;

class DocumentChunk extends Model
{
    protected $fillable = [
        'document_id', 'content', 'chunk_index',
        'token_count', 'section_title', 'page_number', 'embedding',
    ];

    protected $casts = [
        'embedding' => 'array',  // cast vector column to array
    ];

    public function document(): BelongsTo
    {
        return $this->belongsTo(Document::class);
    }
}

Step 1: Document Upload

// app/Http/Controllers/DocumentController.php
<?php

namespace App\Http\Controllers;

use App\Jobs\ProcessDocument;
use App\Models\Document;
use Illuminate\Http\JsonResponse;
use Illuminate\Http\Request;
use Illuminate\Support\Facades\Storage;

class DocumentController extends Controller
{
    public function store(Request $request): JsonResponse
    {
        $request->validate([
            'file'  => ['required', 'file', 'mimes:pdf,txt,md', 'max:20480'], // 20MB
            'title' => ['nullable', 'string', 'max:255'],
        ]);

        $file = $request->file('file');

        // Store the original file
        $path = Storage::put('documents', $file);

        // Create the document record
        $document = Document::create([
            'title'     => $request->input('title', $file->getClientOriginalName()),
            'filename'  => $path,
            'mime_type' => $file->getMimeType(),
            'file_size' => $file->getSize(),
            'user_id'   => $request->user()->id,
            'status'    => 'pending',
        ]);

        // Queue the processing job
        ProcessDocument::dispatch($document)->onQueue('documents');

        return response()->json([
            'document' => $document,
            'message'  => 'Document uploaded. Processing in background.',
        ], 201);
    }
}

Step 2: The Chunking Strategy

Chunking is where most RAG systems get it wrong. Chunk too small and you lose context. Chunk too large and you retrieve irrelevant text that dilutes the answer.

The approach that works best for most documents:

// app/Services/DocumentChunker.php
<?php

namespace App\Services;

class DocumentChunker
{
    // Target ~400 tokens per chunk with 50-token overlap
    // At ~4 chars/token, that's ~1600 chars with ~200 char overlap
    private const TARGET_CHUNK_SIZE = 1600;
    private const OVERLAP_SIZE      = 200;
    private const MIN_CHUNK_SIZE    = 200;  // ignore tiny chunks

    /**
     * Split text into overlapping chunks with metadata.
     *
     * @return array<int, array{content: string, chunk_index: int, token_count: int, section_title: string|null}>
     */
    public function chunk(string $text): array
    {
        // First: split by section headings (Markdown or detected headings)
        $sections = $this->splitBySections($text);

        $chunks      = [];
        $chunkIndex  = 0;

        foreach ($sections as $section) {
            $sectionTitle = $section['title'];
            $sectionText  = $section['content'];

            // If the section fits in one chunk, keep it together
            if (strlen($sectionText) <= self::TARGET_CHUNK_SIZE) {
                if (strlen(trim($sectionText)) >= self::MIN_CHUNK_SIZE) {
                    $chunks[] = [
                        'content'       => trim($sectionText),
                        'chunk_index'   => $chunkIndex++,
                        'token_count'   => $this->estimateTokens($sectionText),
                        'section_title' => $sectionTitle,
                    ];
                }
                continue;
            }

            // Otherwise: split by paragraphs, respecting chunk size
            $paragraphs = preg_split('/\n\s*\n/', $sectionText, -1, PREG_SPLIT_NO_EMPTY);
            $current    = '';

            foreach ($paragraphs as $paragraph) {
                $paragraph = trim($paragraph);
                if (empty($paragraph)) continue;

                // If adding this paragraph exceeds the limit, flush the current chunk
                if (strlen($current) + strlen($paragraph) > self::TARGET_CHUNK_SIZE && $current !== '') {
                    if (strlen(trim($current)) >= self::MIN_CHUNK_SIZE) {
                        $chunks[] = [
                            'content'       => trim($current),
                            'chunk_index'   => $chunkIndex++,
                            'token_count'   => $this->estimateTokens($current),
                            'section_title' => $sectionTitle,
                        ];
                    }

                    // Start new chunk with overlap from the end of the previous
                    $current = $this->extractOverlap($current) . "\n\n" . $paragraph;
                } else {
                    $current .= ($current ? "\n\n" : '') . $paragraph;
                }
            }

            // Flush the last chunk of this section
            if (strlen(trim($current)) >= self::MIN_CHUNK_SIZE) {
                $chunks[] = [
                    'content'       => trim($current),
                    'chunk_index'   => $chunkIndex++,
                    'token_count'   => $this->estimateTokens($current),
                    'section_title' => $sectionTitle,
                ];
            }
        }

        return $chunks;
    }

    private function splitBySections(string $text): array
    {
        // Split on Markdown headings (##, ###) or lines that look like headings
        $lines    = explode("\n", $text);
        $sections = [];
        $current  = ['title' => null, 'content' => ''];

        foreach ($lines as $line) {
            // Detect heading: Markdown ## or ALL CAPS short line
            if (preg_match('/^#{1,3}\s+(.+)/', $line, $match)) {
                if (trim($current['content'])) {
                    $sections[] = $current;
                }
                $current = ['title' => trim($match[1]), 'content' => ''];
            } else {
                $current['content'] .= $line . "\n";
            }
        }

        if (trim($current['content'])) {
            $sections[] = $current;
        }

        return $sections ?: [['title' => null, 'content' => $text]];
    }

    private function extractOverlap(string $text): string
    {
        // Take the last N characters of the previous chunk for context continuity
        $overlap = substr($text, -self::OVERLAP_SIZE);
        // Don't start mid-word
        $firstSpace = strpos($overlap, ' ');
        return $firstSpace !== false ? substr($overlap, $firstSpace + 1) : $overlap;
    }

    private function estimateTokens(string $text): int
    {
        // Rough estimate: 1 token ≈ 4 characters for English text
        return (int) ceil(strlen($text) / 4);
    }
}

Step 3: Embedding Generation and Storage

// app/Jobs/ProcessDocument.php
<?php

namespace App\Jobs;

use App\Models\Document;
use App\Models\DocumentChunk;
use App\Services\DocumentChunker;
use Illuminate\Bus\Queueable;
use Illuminate\Contracts\Queue\ShouldQueue;
use Illuminate\Foundation\Bus\Dispatchable;
use Illuminate\Queue\InteractsWithQueue;
use Illuminate\Support\Facades\AI;
use Illuminate\Support\Facades\Log;
use Illuminate\Support\Facades\Storage;
use Shipfastlabs\Parsel;

class ProcessDocument implements ShouldQueue
{
    use Dispatchable, InteractsWithQueue, Queueable;

    public int $timeout  = 300;
    public int $tries    = 2;
    public int $backoff  = 60;

    public function __construct(
        private Document $document
    ) {}

    public function handle(DocumentChunker $chunker): void
    {
        $this->document->update(['status' => 'processing']);

        try {
            // Step 1: Extract text from the file
            $text = $this->extractText();

            // Step 2: Split into chunks
            $chunks = $chunker->chunk($text);

            if (empty($chunks)) {
                throw new \RuntimeException('No content could be extracted from the document.');
            }

            // Step 3: Generate embeddings and store chunks
            $this->embedAndStore($chunks);

            // Step 4: Mark document as ready
            $this->document->markReady();

            Log::info('Document processed successfully', [
                'document_id' => $this->document->id,
                'chunk_count' => count($chunks),
            ]);

        } catch (\Throwable $e) {
            $this->document->markFailed($e->getMessage());
            Log::error('Document processing failed', [
                'document_id' => $this->document->id,
                'error'       => $e->getMessage(),
            ]);
            throw $e;
        }
    }

    private function extractText(): string
    {
        $path = Storage::path($this->document->filename);
        $mime = $this->document->mime_type;

        // Use Parsel for PDF extraction, plain read for text/markdown
        if ($mime === 'application/pdf') {
            return Parsel::file($path)->text();
        }

        return file_get_contents($path);
    }

    private function embedAndStore(array $chunks): void
    {
        // Process in batches to respect API rate limits
        // OpenAI text-embedding-3-small supports up to 2048 inputs per request
        $batches = array_chunk($chunks, 100);

        foreach ($batches as $batch) {
            // Extract just the content strings for embedding
            $contents = array_column($batch, 'content');

            // Generate embeddings for all chunks in this batch — one API call
            $embeddings = AI::embed($contents)->embeddings;

            // Store each chunk with its embedding
            $records = [];
            foreach ($batch as $index => $chunk) {
                $records[] = [
                    'document_id'   => $this->document->id,
                    'content'       => $chunk['content'],
                    'chunk_index'   => $chunk['chunk_index'],
                    'token_count'   => $chunk['token_count'],
                    'section_title' => $chunk['section_title'],
                    'embedding'     => json_encode($embeddings[$index]->embedding),
                    'created_at'    => now(),
                    'updated_at'    => now(),
                ];
            }

            // Bulk insert for efficiency
            DocumentChunk::insert($records);

            // Respect rate limits — small delay between batches
            if (count($batches) > 1) {
                usleep(100_000);  // 100ms
            }
        }
    }
}

Step 4: Semantic Search (Retrieval)

// app/Services/DocumentRetriever.php
<?php

namespace App\Services;

use App\Models\DocumentChunk;
use Illuminate\Support\Collection;
use Illuminate\Support\Facades\AI;

class DocumentRetriever
{
    private const DEFAULT_RESULTS = 5;
    private const MIN_SIMILARITY  = 0.7;  // cosine similarity threshold

    /**
     * Find the most semantically similar chunks for a given query.
     *
     * @return Collection<DocumentChunk>
     */
    public function retrieve(
        string  $query,
        int     $limit     = self::DEFAULT_RESULTS,
        ?array  $documentIds = null,
    ): Collection {
        // Embed the user's question
        $queryEmbedding = AI::embed($query)->embeddings[0]->embedding;

        // Vector similarity search using Laravel 13's native whereVectorSimilarTo
        $results = DocumentChunk::query()
            ->with('document:id,title,filename')
            ->when($documentIds, fn($q) => $q->whereIn('document_id', $documentIds))
            ->whereHas('document', fn($q) => $q->where('status', 'ready'))
            ->whereVectorSimilarTo('embedding', $queryEmbedding, limit: $limit * 2)
            ->get();

        // Filter by minimum similarity score and deduplicate by document section
        return $results
            ->filter(fn($chunk) => $chunk->similarity >= self::MIN_SIMILARITY)
            ->unique(fn($chunk) => $chunk->document_id . '_' . $chunk->section_title)
            ->take($limit)
            ->values();
    }

    /**
     * Retrieve with MMR (Maximal Marginal Relevance) for diversity.
     * Prevents returning 5 very similar chunks from the same section.
     */
    public function retrieveWithDiversity(
        string $query,
        int    $limit = self::DEFAULT_RESULTS
    ): Collection {
        // Get more candidates than needed
        $candidates = $this->retrieve($query, limit: $limit * 3);

        if ($candidates->count() <= $limit) {
            return $candidates;
        }

        $selected   = collect([$candidates->first()]);
        $remaining  = $candidates->skip(1);

        // Iteratively select chunks that are relevant but diverse
        while ($selected->count() < $limit && $remaining->isNotEmpty()) {
            $best      = null;
            $bestScore = -1;

            foreach ($remaining as $candidate) {
                // Score = relevance - max similarity to already selected chunks
                $maxSimilarityToSelected = $selected->max(fn($s) =>
                    $this->cosineSimilarity($s->embedding, $candidate->embedding)
                );

                $mmrScore = $candidate->similarity - (0.5 * $maxSimilarityToSelected);

                if ($mmrScore > $bestScore) {
                    $best      = $candidate;
                    $bestScore = $mmrScore;
                }
            }

            if ($best) {
                $selected->push($best);
                $remaining = $remaining->where('id', '!=', $best->id);
            } else {
                break;
            }
        }

        return $selected;
    }

    private function cosineSimilarity(array $a, array $b): float
    {
        $dot    = array_sum(array_map(fn($x, $y) => $x * $y, $a, $b));
        $normA  = sqrt(array_sum(array_map(fn($x) => $x * $x, $a)));
        $normB  = sqrt(array_sum(array_map(fn($x) => $x * $x, $b)));
        return $normA && $normB ? $dot / ($normA * $normB) : 0;
    }
}

Step 5: Context Assembly and Prompt Engineering

The quality of your answer depends heavily on how you present the retrieved context to the LLM:

// app/Services/ContextAssembler.php
<?php

namespace App\Services;

use Illuminate\Support\Collection;

class ContextAssembler
{
    /**
     * Format retrieved chunks into a context block for the prompt.
     */
    public function assemble(Collection $chunks): string
    {
        if ($chunks->isEmpty()) {
            return '';
        }

        $contextParts = [];

        foreach ($chunks as $index => $chunk) {
            $source = $chunk->document->title;
            $section = $chunk->section_title ? " > {$chunk->section_title}" : '';
            $page    = $chunk->page_number ? " (page {$chunk->page_number})" : '';

            $contextParts[] = implode("\n", [
                "--- Source {$index}: {$source}{$section}{$page} ---",
                $chunk->content,
            ]);
        }

        return implode("\n\n", $contextParts);
    }

    /**
     * Build the full system prompt for RAG Q&A.
     */
    public function buildSystemPrompt(string $context): string
    {
        return <<<PROMPT
        You are a helpful assistant that answers questions based strictly on the provided document excerpts.

        RULES:
        1. Answer ONLY based on the context provided below. Do not use outside knowledge.
        2. If the context doesn't contain enough information to answer, say so clearly.
        3. Always cite which source(s) you used in your answer (e.g., "According to [Document Title]...").
        4. If information appears in multiple sources, synthesise them and cite all relevant sources.
        5. Be concise but complete. Use bullet points for lists.
        6. Never make up information not present in the context.

        CONTEXT:
        {$context}

        Answer the user's question based on the above context.
        PROMPT;
    }
}

Step 6: The Q&A Endpoint — Putting It All Together

// app/Http/Controllers/QuestionController.php
<?php

namespace App\Http\Controllers;

use App\Services\ContextAssembler;
use App\Services\DocumentRetriever;
use Illuminate\Http\JsonResponse;
use Illuminate\Http\Request;
use Illuminate\Support\Facades\AI;

class QuestionController extends Controller
{
    public function __construct(
        private DocumentRetriever $retriever,
        private ContextAssembler  $assembler,
    ) {}

    public function ask(Request $request): JsonResponse
    {
        $request->validate([
            'question'     => ['required', 'string', 'max:500'],
            'document_ids' => ['nullable', 'array'],
            'document_ids.*' => ['integer', 'exists:documents,id'],
        ]);

        $question    = $request->input('question');
        $documentIds = $request->input('document_ids');

        // Step 1: Retrieve relevant chunks
        $chunks = $this->retriever->retrieveWithDiversity(
            query:       $question,
            limit:       5,
        );

        // If no relevant chunks found — don't hallucinate
        if ($chunks->isEmpty()) {
            return response()->json([
                'answer'  => 'I could not find relevant information in the uploaded documents to answer your question.',
                'sources' => [],
            ]);
        }

        // Step 2: Assemble context
        $context      = $this->assembler->assemble($chunks);
        $systemPrompt = $this->assembler->buildSystemPrompt($context);

        // Step 3: Generate answer
        $response = AI::text(
            prompt:  $question,
            system:  $systemPrompt,
            model:   'gpt-4o',
        );

        // Step 4: Build source citations
        $sources = $chunks->map(fn($chunk) => [
            'document_id'   => $chunk->document_id,
            'document_title'=> $chunk->document->title,
            'section'       => $chunk->section_title,
            'page'          => $chunk->page_number,
            'similarity'    => round($chunk->similarity, 3),
            'excerpt'       => substr($chunk->content, 0, 200) . '...',
        ])->values()->all();

        return response()->json([
            'answer'   => $response->text,
            'sources'  => $sources,
            'chunks_used' => $chunks->count(),
        ]);
    }
}

Step 7: Streaming Responses

For long answers, streaming delivers a better UX — the user sees the answer appear token by token rather than waiting for the full response:

// Streaming version of the ask endpoint
public function stream(Request $request): \Symfony\Component\HttpFoundation\StreamedResponse
{
    $request->validate([
        'question' => ['required', 'string', 'max:500'],
    ]);

    $question = $request->input('question');
    $chunks   = $this->retriever->retrieveWithDiversity($question, 5);

    $context      = $this->assembler->assemble($chunks);
    $systemPrompt = $this->assembler->buildSystemPrompt($context);

    return response()->stream(function () use ($question, $systemPrompt, $chunks) {
        $stream = AI::text(
            prompt: $question,
            system: $systemPrompt,
            model:  'gpt-4o',
            stream: true,
        );

        foreach ($stream as $chunk) {
            echo "data: " . json_encode(['token' => $chunk->text]) . "\n\n";
            ob_flush();
            flush();
        }

        // Send sources after the stream completes
        $sources = $chunks->map(fn($c) => [
            'document_title' => $c->document->title,
            'section'        => $c->section_title,
        ])->values()->all();

        echo "data: " . json_encode(['sources' => $sources, 'done' => true]) . "\n\n";
        ob_flush();
        flush();

    }, 200, [
        'Content-Type'  => 'text/event-stream',
        'Cache-Control' => 'no-cache',
        'X-Accel-Buffering' => 'no',
    ]);
}

Using the Built-in SimilaritySearch Agent Tool

For more complex conversational use cases where the AI decides when to search, the Laravel AI SDK’s SimilaritySearch tool is the cleaner approach:

// app/Agents/DocumentQAAgent.php
<?php

namespace App\Agents;

use App\Models\DocumentChunk;
use Illuminate\Support\Facades\AI;
use Laravel\Ai\Tools\SimilaritySearch;

class DocumentQAAgent
{
    public function answer(string $question, array $documentIds = []): string
    {
        $agent = AI::agent()
            ->model('gpt-4o')
            ->system('You are a helpful assistant. Use the search tool to find relevant information before answering. Always cite your sources.')
            ->tools([
                // Built-in SimilaritySearch tool — the agent decides when to use it
                SimilaritySearch::usingModel(DocumentChunk::class, 'embedding')
                    ->returning(['content', 'section_title', 'document_id'])
                    ->limit(5),
            ]);

        $response = $agent->ask($question);

        return $response->text;
    }
}

The agent approach is powerful for multi-turn conversations where the AI decides how many searches to perform, what to search for, and how to combine results — without you hardcoding the retrieval logic.

Chunking Strategy Comparison

The right chunking strategy depends on your document type:

Document Type	Recommended Strategy	Chunk Size	Overlap
Technical docs / manuals	Section-aware (by heading)	400-500 tokens	50 tokens
Legal documents	Paragraph-based	300-400 tokens	100 tokens
FAQ documents	Per Q&A pair	Varies	None
Code documentation	Per function/class	Varies	None
Books / long-form content	Sliding window	500 tokens	100 tokens
Chat transcripts	Per message or turn	Varies	None

Production Considerations

Embedding Caching

Embedding API calls cost money. Cache embeddings for identical inputs:

// The Laravel AI SDK supports embedding caching natively
$embedding = AI::embed($text, cache: true)->embeddings[0]->embedding;
// Identical text → cached result, no API call

Reprocessing and Updates

When a document is updated, delete the old chunks and reprocess:

public function reprocess(Document $document): void
{
    // Delete existing chunks
    $document->chunks()->delete();

    // Reset status
    $document->update(['status' => 'pending', 'chunk_count' => 0]);

    // Requeue
    ProcessDocument::dispatch($document)->onQueue('documents');
}

Filtering by Document Scope

Not all questions should search all documents. Always allow filtering by document IDs:

// User can ask questions scoped to specific documents
$chunks = $this->retriever->retrieve(
    query:       $question,
    documentIds: [$projectManualId, $technicalSpecId],
);

The Complete Route Setup

// routes/api.php
use App\Http\Controllers\DocumentController;
use App\Http\Controllers\QuestionController;

Route::middleware(['auth:sanctum'])->group(function () {
    // Document management
    Route::post('/documents',            [DocumentController::class, 'store']);
    Route::get('/documents',             [DocumentController::class, 'index']);
    Route::get('/documents/{document}',  [DocumentController::class, 'show']);
    Route::delete('/documents/{document}', [DocumentController::class, 'destroy']);

    // Q&A
    Route::post('/ask',      [QuestionController::class, 'ask']);       // standard JSON
    Route::post('/ask/stream', [QuestionController::class, 'stream']); // SSE streaming
});

Why RAG Answers Are Better Than Raw LLM Answers

The difference is grounding. An LLM answering from training data:

May hallucinate specifics
Cannot cite your documents
Knows nothing about events after its training cutoff
Has no access to your internal documentation

A RAG system answering from your documents:

Can only answer from what’s in your documents
Cites the exact source and section
Works with documents created today
Scales with your knowledge base — add a document, instantly searchable

The failure mode of RAG is also more honest: “I couldn’t find relevant information in the documents” is a far better failure than a confident hallucination.

Final Thoughts

Building this took a weekend because Laravel 13’s AI SDK removed all the infrastructure friction. pgvector in the query builder means no Pinecone account. AI::embed() means no manual OpenAI client. SimilaritySearch as an agent tool means the RAG pattern is a first-class citizen of the framework.

The chunking strategy and the prompt are where you spend most of your tuning time — the infrastructure is table stakes. Start with the section-aware chunker in this post, measure your retrieval quality, and adjust chunk size based on what your documents look like.

The system in this post — upload documents, ask questions, get cited answers — is production-deployable. It handles PDFs and text files, stores embeddings efficiently, retrieves with diversity to avoid redundant context, and streams responses for good UX. That’s a functional document intelligence system in one codebase, with the framework you already know.