ecommerce_ops · workflow
Instacart transitions from batch-oriented to real-time machine learning across its four-sided marketplace
Instacart's batch-oriented ML systems produced stale predictions, wasted compute on inactive users, could not cover long-tail user-item pairs, and lacked access to real-time signals such as live product availability and in-session shopping intent.
How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Raw events published to Kafka
Services publish raw events to Kafka, which serves as centralized storage for all real-time ML inputs.
Tools used
KafkaFlinkFeature StoreGriffinOnline Inference Platform
Outcome
The real-time ML platform reduced item availability update latency from hours to seconds, enabled session-based personalization, and reduced millions in fraud-related costs annually, with GTV growth confirmed in A/B experiments.
What failed first
Batch ML performed poorly on new queries, siloed streaming technologies across teams produced inconsistent event quality, and the shift to real-time serving introduced latency and availability risks that the existing infrastructure could not absorb.
Results
Time savedseconds from a couple of hours
Volumedirectly improves item found rate
Cost replacedmillions of fraud-related costs annually
Grounding & classification
Source type: technical build writeup
27 fields verified against source quotes, 1 dropped as unverifiable.
fraud detectionpersonalizationpredictive analyticsrecommendation systemproduct catalogfailure mode describedmetric backednamed customerproduction runtime claimedtools describedworkflow describedecommercecost reductioncustomer satisfactioncycle time reductiontechnical build writeupecommerce opsdata sync enrichmentmonitor detect alert