back_office_ops · workflow

Introducing Griffin 2.0: Instacart's Next-Gen ML Platform

Griffin 1.0's CLI and GitHub PR-based interfaces imposed a steep learning curve, lacked standardization, were only vertically scalable, and created a fragmented user experience by forcing engineers to switch between multiple platforms.

How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Feature definition in UI
MLEs define features in the Feature Sources UI using SQL queries or Flink SQL/Scala code.
Tools used
DockerRayTensorFlowLightGBMMLFlowDatadogAirflowTerraformBentoMLFlink SQLTwirpAWS ECS
Outcome

Griffin 2.0 replaced CLI and PR-based workflows with a unified web UI and REST API, enabled distributed training and LLM fine-tuning via Ray on Kubernetes, drastically reduced inference service setup time, and achieved substantial latency optimization for real-time inference.

What failed first

The MLFlow-based model registry in Griffin 1.0 could not handle the required query scalability, and the fire-and-forget CLI approach made it difficult to retrieve metadata or manage training-serving lineage for production deployments.

Results
Time saveddrastically reduced
Cost replacedaids in cost savings
Source

https://tech.instacart.com/introducing-griffin-2-0-instacarts-next-gen-ml-platform-b7331e73b8d7

How we source this →

Grounding & classification
Source type: technical build writeup
25 fields verified against source quotes, 1 dropped as unverifiable.
failure mode describednamed customerproduction runtime claimedtools describedworkflow describedecommercecost reductioncycle time reductionemployee productivitythroughput increasetechnical build writeupback office ops