DoorDash trains Twin Neural Network catalog embeddings for search and recommendations
DoorDash's catalog is extremely large and constantly growing, making it impossible to manually label or analyze at scale. Multiple teams needed a common, generalizable way to represent catalog items for ML use cases — recommendations, search, and promotions — but existing embedding approaches had significant drawbacks for large, sparse, continuously evolving catalogs.
The Siamese Neural Network improved F1-score by ~23% over the FastText baseline, outperforming supervised LSTM classifiers (+15%). The embeddings also required more than three times the labeled data for a FastText classifier to achieve comparable accuracy, demonstrating substantially better sample efficiency. The embeddings are now deployed across recommendations and programmatic merchandising with immediate substantial improvements.
Word2vec embeddings required computationally expensive daily retraining as millions of items were added and suffered from sparsity for infrequently interacted items. Supervised deep neural network classifiers did not guarantee good metric properties and depended heavily on annotation quality for rare classes. BERT fine-tuning was too slow for inference due to model size, even with distilled variants.
https://careersatdoordash.com/blog/using-twin-neural-networks-to-train-catalog-item-embeddings/