ecommerce_ops · ecommerce · workflow

Instacart builds PARSE, a multi-modal LLM platform for catalog attribute extraction at scale

Instacart's catalog attribute creation relied on SQL-based rules and traditional ML models that struggled with complex or context-dependent attributes, required significant per-attribute engineering effort, and could not extract information from product images — resulting in slow development cycles and inconsistent attribute quality.

How it works

Common implementation structure

How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.

Stage 1 · Configure attribute extraction task

Teams use the platform in development mode to experiment with different models, prompts, and input sources.

Tools used

LLMsPARSE

Outcome

PARSE accelerated attribute extraction: simpler attributes now take one day of effort compared to one week previously, complex attribute iteration was reduced to just three days, multi-modal LLMs increased recall by 10% over text-only models, and simpler attributes can be handled at a 70% cost reduction using less powerful models.

What failed first

Pre-LLM approaches — SQL rules and traditional ML models — failed to scale: SQL handled only simple keyword-based extractions, ML required separate labeled datasets and pipelines per attribute, and neither could process image-based product data.

Results

Time savedone day of effort, compared to one week previously

Volume10%

Cost replaced70%

Source

https://tech.instacart.com/multi-modal-catalog-attribute-extraction-platform-at-instacart-b9228754a527

How we source this →

Grounding & classification

Source type: technical build writeup

31 fields verified against source quotes, 1 dropped as unverifiable.

computer visiondata extractiondocument aiquality inspectionproduct catalogbuilder submittedhuman review describedmetric backednamed customerproduction runtime claimedsource backedtools describedworkflow describedecommerceretailaccuracy improvementcost reductionemployee productivitytime savedtechnical build writeupdata entry opsecommerce opsdocument to recordextract classify routehuman review queue