back_office_ops · saas · workflow

Two years of vector search at Notion: 10x scale with 90% cost reduction

Notion AI Q&A launched in November 2023 to overwhelming demand, creating a waitlist of millions of workspaces. The original pod-based vector infrastructure neared storage capacity within one month of launch, and daily onboarding was so slow that clearing the backlog at the initial rate would have taken decades.

How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Dual-path ingestion pipeline
The indexing pipeline runs two paths: offline Spark batch jobs that chunk documents and bulk-load vectors, and online Kafka consumers that process individual page edits in real time.
Tools used
Apache SparkKafkaAirflowturbopufferDynamoDBRayAnyscaleAWS EMRxxHashRay Serve
Outcome

Over two years, Notion scaled its vector search infrastructure by 10x while reducing costs by 90 percent, achieving a 600x increase in daily onboarding capacity and clearing the Q&A waitlist by April 2024. p50 query latency improved from 70–100ms to 50–70ms, and hash-based selective re-embedding achieved a 70% reduction in data volume.

What failed first

The original dedicated-hardware pod architecture coupled storage and compute, making over-provisioning prohibitively expensive and requiring complex incremental re-sharding every two weeks. Managing multiple database generations became operationally complex and expensive during the growth phase.

Results
Time savedimproved from 70-100ms to 50-70ms
Volume10x
Cost replaced90 percent
Running sinceNovember 2023
Source

https://www.notion.com/blog/two-years-of-vector-search-at-notion

How we source this →

Grounding & classification
Source type: technical build writeup
40 fields verified against source quotes.
enterprise searchknowledge searchragknowledge basefailure mode describedmetric backednamed customerproduction runtime claimedtools describedworkflow describedsoftwarecost reductionresponse time reductionthroughput increasetechnical build writeupback office opsrag answering