back_office_ops · workflow

How Dropbox built the feature store powering real-time ML ranking in Dash

Dropbox Dash needed a feature store for real-time ML ranking across vast numbers of work documents, but their infrastructure split across on-premises and Spark-native cloud environments ruled out off-the-shelf solutions. The system had to handle massive parallel feature lookups while meeting strict sub-100ms latency budgets and near-real-time freshness requirements.

How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · User query triggers ranking
A single user query triggers the ranker to evaluate many files, each requiring dozens of behavioral and contextual features.
Tools used
FeastDynovaultGoSparkPySparkAWS DynamoDB
Outcome

The rewritten Go serving layer handles thousands of requests per second with p95 latencies in the ~25–35ms range; intelligent change detection cut batch update times from more than an hour to under five minutes and reduced write volumes from hundreds of millions to under one million records per run.

What failed first

The initial Python-based feature serving layer, built on the Feast SDK, hit CPU-bound JSON parsing bottlenecks and Python's Global Interpreter Lock under high concurrency; switching to multiple processes temporarily improved latency but introduced coordination overhead that capped scalability.

Results
Time savedsub-100ms
Volumefrom hundreds of millions to under one million records per run
Source

https://dropbox.tech/machine-learning/feature-store-powering-realtime-ai-in-dropbox-dash

How we source this →

Grounding & classification
Source type: technical build writeup
29 fields verified against source quotes.
enterprise searchpredictive analyticsragknowledge basefailure mode describedmetric backednamed customerproduction runtime claimedtools describedworkflow describedsoftwarecycle time reductionresponse time reductionthroughput increasetechnical build writeupback office opsdata sync enrichment