Back

Architecture · Live demo

Behind the curtain

Six steps, one cited answer.

A live look at the pipeline you just used — the embeddings, the vector store, the cron that quietly throws everything out after a day. What you see here is what's actually running in production.

Active documents

Embedded chunks

Avg / document

Next cleanup

Pipeline

How a question becomes a cited answer

01

Upload

POST /api/upload

Client guards the 5 MB cap; server re-verifies size, MIME, and the %PDF magic bytes. Upstash holds a per-IP fixed window — 3 uploads per day, no exceptions.

≤ 5 MB≤ 50 pages3 / IP / day
02

Chunk

pdf-parse · per page

pdf-parse walks the document one page at a time. Each page is normalised and sliced into ~500-token blocks with 50-token overlap, snapping to sentence boundaries when it can.

≈ 500 tok / chunk50-tok overlapsoft breaks
03

Embed

Vertex · text-embedding-004

Chunks pack into batches under Vertex's 20k-token limit, pessimistically estimated at chars/2. Retries on 429 with exponential backoff, plus a 250 ms breath between batches.

768-dim vectors8k tok / batch5× retry
04

Store

Supabase · pgvector

Vectors land in a dedicated rag schema with RLS enabled (anon role bounced at the door). An ivfflat cosine index handles approximate-nearest-neighbour search.

schema ragivfflat · lists=100RLS on
05

Ask

POST /api/chat

The question is embedded once, fed to the match_chunks RPC scoped to the active document. Top 5 passages return as context, each keeping its page number for citations.

top 5 · cosinedoc-scoped20 / IP / day
06

Stream

Vertex · Gemini 2.5 Flash

A system prompt enforces 'cite [Source N] or refuse'. Tokens stream back as Server-Sent Events. The client renders citation footnotes inline as the text arrives.

max 1024 tokT=0.2SSE

Auto-cleanup

The 24-hour eviction

Schedule

Retention

cascade delete

Next run in

A Vercel cron hits /api/cleanup daily with a Bearer token. The endpoint calls rag.cleanup_old_documents('24 hours') — a single SQL function that drops anything older than the cutoff and lets the foreign-key cascade do the rest. No filename logs are kept.

Live data

Currently in the demo

Filenames are redacted — fellow visitors uploaded these, and that's their business.

LabelIDPagesChunksAgeDeletes in
Loading…

Stack

Specification

FrameworkNext.js 14 · App Router · TS
StylingTailwind CSS
GenerationVertex AI · Gemini 2.5 Flash
EmbeddingsVertex AI · text-embedding-004 · 768d
Vector storeSupabase Postgres + pgvector
Rate limitUpstash Redis · fixed window
PDF viewerreact-pdf · client-side highlights
Local cacheIndexedDB blobs · idb wrapper
HostVercel · daily cron
Sourcegithub.com/chitai-dev (private demo)

For hire

Want this on your stack?

One-time build. I scope it with you, build it on your GCP project, hand over the code and the keys. No retainer, no monthly fee, no “tier” you have to stay subscribed to. The infrastructure is yours from day one — I just got it shipped.

What's included

  • Production-grade build of this exact pattern, hardened for your data
  • Deployed to your GCP project — you own billing, models, infrastructure
  • Hybrid retrieval (BM25 + dense + reranker) and a proper eval harness
  • VPC / CMEK / VPC-SC if you need it. Auth (IAP, OIDC) wired through
  • Observability — Cloud Logging, Trace, per-tenant cost dashboards
  • Handover doc + a runbook your ops team can actually use

What I don't do

  • Monthly retainers or seat-based pricing
  • Lock-in via a hosted layer you can't remove
  • Black-box pipelines you'd have to rebuild to extend
  • Fluffy 'AI strategy' decks

Scope a project

Send me the rough idea — document count, target users, deadline — and I'll come back with a fixed scope, fixed price, and a build window.

chitaidev@gmail.com