ecommerce_ops · workflow

DoorDash trains Twin Neural Network catalog embeddings for search and recommendations

DoorDash's catalog is extremely large and constantly growing, making it impossible to manually label or analyze at scale. Multiple teams needed a common, generalizable way to represent catalog items for ML use cases — recommendations, search, and promotions — but existing embedding approaches had significant drawbacks for large, sparse, continuously evolving catalogs.

How it works

Common implementation structure

How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.

Stage 1 · Search session data ingestion

User search queries and purchase behavior from DoorDash sessions form the raw training signal for the embedding model.

Tools used

Siamese Neural Networkbidirectional LSTMFastTextUMAPWord2vec

Outcome

The Siamese Neural Network improved F1-score by ~23% over the FastText baseline, outperforming supervised LSTM classifiers (+15%). The embeddings also required more than three times the labeled data for a FastText classifier to achieve comparable accuracy, demonstrating substantially better sample efficiency. The embeddings are now deployed across recommendations and programmatic merchandising with immediate substantial improvements.

What failed first

Word2vec embeddings required computationally expensive daily retraining as millions of items were added and suffered from sparsity for infrequently interacted items. Supervised deep neural network classifiers did not guarantee good metric properties and depended heavily on annotation quality for rare classes. BERT fine-tuning was too slow for inference due to model size, even with distilled variants.

Results

Volume~23%

Source

https://careersatdoordash.com/blog/using-twin-neural-networks-to-train-catalog-item-embeddings/

How we source this →

Grounding & classification

Source type: technical build writeup

25 fields verified against source quotes, 1 dropped as unverifiable.

enterprise searchpersonalizationrecommendation systemproduct catalogfailure mode describedmetric backednamed customerproduction runtime claimedtools describedworkflow describedecommerceaccuracy improvementemployee productivitytechnical build writeupecommerce opsmarketing opsdata sync enrichment