ecommerce_ops · workflow

Instacart builds semantic IDs to power cross-category product understanding and recommendations at scale

Instacart's hierarchical product taxonomy missed cross-category connections customers naturally expect, leaving new products invisible at cold start, tail categories underserved by recommendation models, and mislabeled products impossible to detect at catalog scale.

How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Product catalog ingestion
Millions of products across thousands of categories enter the system, each assigned to a hierarchical taxonomy category.
Tools used
RQ-VAEESCIGemini FlashGemmaLLMs
Outcome

Semantic IDs delivered a 34% increase in add-to-carts on product carousels, surfaced products from 2.7x more emerging brands with tail categories seeing the largest gains, and became core infrastructure for product retrieval, replacement recommendations, and next-item prediction across Instacart.

What failed first

Vanilla RQ-VAE compression without structural guidance caused fragmentation — similar products landing in different branches — and error propagation from sparse or inconsistent product text, while the rigid taxonomy alone offered no mechanism to flag mislabeled items.

Results
Volume+34%
Cost replaced~5x cheaper
Source

https://tech.instacart.com/semantic-ids-product-understanding-at-scale-5283e0288f5a

How we source this →

Grounding & classification
Source type: technical build writeup
31 fields verified against source quotes.
data extractionenterprise searchrecommendation systemproduct catalogfailure mode describedhuman review describedmetric backednamed customerproduction runtime claimedtools describedworkflow describedecommerceretailaccuracy improvementerror reductionthroughput increasetechnical build writeupback office opsecommerce opsdata sync enrichmentextract classify route