Two years of vector search at Notion: 10x scale with 90% cost reduction
Notion AI Q&A launched in November 2023 to overwhelming demand, creating a waitlist of millions of workspaces. The original pod-based vector infrastructure neared storage capacity within one month of launch, and daily onboarding was so slow that clearing the backlog at the initial rate would have taken decades.
How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Dual-path ingestion pipeline
The indexing pipeline runs two paths: offline Spark batch jobs that chunk documents and bulk-load vectors, and online Kafka consumers that process individual page edits in real time.
Over two years, Notion scaled its vector search infrastructure by 10x while reducing costs by 90 percent, achieving a 600x increase in daily onboarding capacity and clearing the Q&A waitlist by April 2024. p50 query latency improved from 70–100ms to 50–70ms, and hash-based selective re-embedding achieved a 70% reduction in data volume.
What failed first
The original dedicated-hardware pod architecture coupled storage and compute, making over-provisioning prohibitively expensive and requiring complex incremental re-sharding every two weeks. Managing multiple database generations became operationally complex and expensive during the growth phase.