Architecture · Live demo

Behind the curtain

Six steps, one cited answer.

A live look at the pipeline you just used — the embeddings, the vector store, the cron that quietly throws everything out after a day. What you see here is what's actually running in production.

Active documents

—

Embedded chunks

—

Avg / document

—

Next cleanup

—

Pipeline

How a question becomes a cited answer

Upload

POST /api/upload

Client guards the 5 MB cap; server re-verifies size, MIME, and the %PDF magic bytes. Upstash holds a per-IP fixed window — 3 uploads per day, no exceptions.

≤ 5 MB≤ 50 pages3 / IP / day

Chunk

pdf-parse · per page

pdf-parse walks the document one page at a time. Each page is normalised and sliced into ~500-token blocks with 50-token overlap, snapping to sentence boundaries when it can.

≈ 500 tok / chunk50-tok overlapsoft breaks

Embed

Vertex · text-embedding-004

Chunks pack into batches under Vertex's 20k-token limit, pessimistically estimated at chars/2. Retries on 429 with exponential backoff, plus a 250 ms breath between batches.

768-dim vectors8k tok / batch5× retry

Store

Supabase · pgvector

Vectors land in a dedicated rag schema with RLS enabled (anon role bounced at the door). An ivfflat cosine index handles approximate-nearest-neighbour search.

schema ragivfflat · lists=100RLS on

Ask

POST /api/chat

The question is embedded once, fed to the match_chunks RPC scoped to the active document. Top 5 passages return as context, each keeping its page number for citations.

top 5 · cosinedoc-scoped20 / IP / day

Stream

Vertex · Gemini 2.5 Flash

A system prompt enforces 'cite [Source N] or refuse'. Tokens stream back as Server-Sent Events. The client renders citation footnotes inline as the text arrives.

max 1024 tokT=0.2SSE

Auto-cleanup

The 24-hour eviction

Schedule

—

Retention

—

cascade delete

Next run in

—

A Vercel cron hits /api/cleanup daily with a Bearer token. The endpoint calls rag.cleanup_old_documents('24 hours') — a single SQL function that drops anything older than the cutoff and lets the foreign-key cascade do the rest. No filename logs are kept.

Live data

Currently in the demo

Filenames are redacted — fellow visitors uploaded these, and that's their business.

Label	ID	Pages	Chunks	Age	Deletes in
Loading…

Stack

Specification

FrameworkNext.js 14 · App Router · TS

StylingTailwind CSS

GenerationVertex AI · Gemini 2.5 Flash

EmbeddingsVertex AI · text-embedding-004 · 768d

Vector storeSupabase Postgres + pgvector

Rate limitUpstash Redis · fixed window

PDF viewerreact-pdf · client-side highlights

Local cacheIndexedDB blobs · idb wrapper

HostVercel · daily cron

Sourcegithub.com/chitai-dev (private demo)

For hire

Want this on your stack?

One-time build. I scope it with you, build it on your GCP project, hand over the code and the keys. No retainer, no monthly fee, no “tier” you have to stay subscribed to. The infrastructure is yours from day one — I just got it shipped.

What's included

Production-grade build of this exact pattern, hardened for your data
Deployed to your GCP project — you own billing, models, infrastructure
Hybrid retrieval (BM25 + dense + reranker) and a proper eval harness
VPC / CMEK / VPC-SC if you need it. Auth (IAP, OIDC) wired through
Observability — Cloud Logging, Trace, per-tenant cost dashboards
Handover doc + a runbook your ops team can actually use

What I don't do

Monthly retainers or seat-based pricing
Lock-in via a hosted layer you can't remove
Black-box pipelines you'd have to rebuild to extend
Fluffy 'AI strategy' decks

Scope a project

Send me the rough idea — document count, target users, deadline — and I'll come back with a fixed scope, fixed price, and a build window.

chitaidev@gmail.com