Instacart builds semantic IDs to power cross-category product understanding and recommendations at scale
Instacart's hierarchical product taxonomy missed cross-category connections customers naturally expect, leaving new products invisible at cold start, tail categories underserved by recommendation models, and mislabeled products impossible to detect at catalog scale.
How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Product catalog ingestion
Millions of products across thousands of categories enter the system, each assigned to a hierarchical taxonomy category.
Tools used
RQ-VAEESCIGemini FlashGemmaLLMs
Outcome
Semantic IDs delivered a 34% increase in add-to-carts on product carousels, surfaced products from 2.7x more emerging brands with tail categories seeing the largest gains, and became core infrastructure for product retrieval, replacement recommendations, and next-item prediction across Instacart.
What failed first
Vanilla RQ-VAE compression without structural guidance caused fragmentation — similar products landing in different branches — and error propagation from sparse or inconsistent product text, while the rigid taxonomy alone offered no mechanism to flag mislabeled items.