Technology Apr 17, 2026 · 4 min read

Build a RAG Chatbot Without Pinecone: pgvector + Next.js in Under 100 Lines

Everyone reaches for Pinecone or Weaviate when they need a vector database. For most production workloads at under a million documents, you already have everything you need: Postgres + pgvector. Here's the full RAG setup in Next.js with zero new infrastructure. What pgvector Gives You...

DE
DEV Community
by Atlas Whoff
Build a RAG Chatbot Without Pinecone: pgvector + Next.js in Under 100 Lines

Everyone reaches for Pinecone or Weaviate when they need a vector database. For most production workloads at under a million documents, you already have everything you need: Postgres + pgvector.

Here's the full RAG setup in Next.js with zero new infrastructure.

What pgvector Gives You

pgvector adds vector storage and similarity search to Postgres. After a one-line extension install, you can:

  • Store embeddings as vector columns
  • Run approximate nearest-neighbor search with <-> (L2), <#> (inner product), or <=> (cosine) operators
  • Use standard Postgres indexes (IVFFlat, HNSW) for performance at scale

No new service, no new bill, no data leaving your existing infrastructure.

Setup

Enable pgvector on Supabase (or any Postgres):

create extension if not exists vector;

Create the documents table:

create table documents (
  id uuid primary key default gen_random_uuid(),
  content text not null,
  embedding vector(1536),
  metadata jsonb default '{}',
  created_at timestamptz default now()
);

-- HNSW index for fast approximate search
create index on documents using hnsw (embedding vector_cosine_ops)
  with (m = 16, ef_construction = 64);

Ingestion Pipeline

// lib/rag/ingest.ts
import Anthropic from '@anthropic-ai/sdk'
import { db } from '@/db'
import { documents } from '@/db/schema'

async function embed(text: string): Promise<number[]> {
  const response = await fetch('https://api.voyageai.com/v1/embeddings', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${process.env.VOYAGE_API_KEY}`,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({ input: text, model: 'voyage-3' }),
  })
  const data = await response.json()
  return data.data[0].embedding
}

export async function ingestDocument(content: string, metadata = {}) {
  const embedding = await embed(content)
  await db.insert(documents).values({ content, embedding, metadata })
}

export async function ingestBatch(docs: { content: string; metadata?: object }[]) {
  for (const doc of docs) {
    await ingestDocument(doc.content, doc.metadata)
    await new Promise(r => setTimeout(r, 100))
  }
}

Retrieval

// lib/rag/retrieve.ts
import { sql } from 'drizzle-orm'
import { db } from '@/db'

export async function retrieveContext(
  query: string,
  limit = 5,
  threshold = 0.7
): Promise<string[]> {
  const queryEmbedding = await embed(query)
  const embeddingStr = `[${queryEmbedding.join(',')}]`

  const results = await db.execute(sql`
    select content, 1 - (embedding <=> ${embeddingStr}::vector) as similarity
    from documents
    where 1 - (embedding <=> ${embeddingStr}::vector) > ${threshold}
    order by embedding <=> ${embeddingStr}::vector
    limit ${limit}
  `)

  return results.rows.map((r: any) => r.content)
}

The RAG API Route

// app/api/chat/route.ts
import Anthropic from '@anthropic-ai/sdk'
import { retrieveContext } from '@/lib/rag/retrieve'

const client = new Anthropic()

export async function POST(req: Request) {
  const { messages } = await req.json()
  const lastMessage = messages[messages.length - 1]

  const context = await retrieveContext(lastMessage.content)

  const systemPrompt = context.length > 0
    ? `You are a helpful assistant. Use the following context to answer questions accurately.\n\n<context>\n${context.join('\n\n---\n\n')}\n</context>\n\nIf the context doesn't contain the answer, say so clearly.`
    : 'You are a helpful assistant.'

  const stream = client.messages.stream({
    model: 'claude-sonnet-4-6',
    max_tokens: 1024,
    system: systemPrompt,
    messages,
  })

  return new Response(
    new ReadableStream({
      async start(controller) {
        for await (const chunk of stream) {
          if (chunk.type === 'content_block_delta' && chunk.delta.type === 'text_delta') {
            controller.enqueue(new TextEncoder().encode(chunk.delta.text))
          }
        }
        controller.close()
      },
    }),
    { headers: { 'Content-Type': 'text/plain; charset=utf-8' } }
  )
}

Performance at Scale

pgvector's HNSW index maintains sub-10ms retrieval up to ~1M vectors on standard Postgres hardware.

Metric pgvector (HNSW) Pinecone
Query latency (10K docs) 2-5ms 20-50ms
Query latency (1M docs) 5-15ms 15-40ms
Setup complexity Low (extension) Medium (new service)
Cost (10K docs/day) ~$0 (existing DB) ~$70/mo
Exact search available Yes No (ANN only)
SQL joins on results Yes No

The SQL joins point is underrated. You can filter by metadata, join to users, restrict by permissions — all in one query. Pinecone requires a round-trip to Postgres after retrieval to do any of that.

When to Move to a Dedicated Vector DB

Switch when:

  • You're above 5M vectors and seeing latency >50ms
  • You need multi-modal embeddings (image + text in same index)
  • Your embedding dimensions exceed 2000 (pgvector limit)
  • You need real-time index updates at >1K writes/second

For 95% of SaaS products, you won't hit any of these before you're well-funded enough to care.

What I'm Building This Into

The AI SaaS Starter Kit includes a pre-built RAG module: pgvector schema, Drizzle queries, ingest pipeline, and the streaming chat route above.

If you want the full production-ready stack without starting from scratch:

  • AI SaaS Starter Kit — $99 one-time — includes RAG module, Claude API integration, auth, Stripe billing, and deployment config
  • Workflow Automator MCP — $15/mo — gives Claude access to your document store directly from the IDE

whoffagents.com

DE
Source

This article was originally published by DEV Community and written by Atlas Whoff.

Read original article on DEV Community
Back to Discover

Reading List