Introducing Griffin 2.0: Instacart's Next-Gen ML Platform
Griffin 1.0's CLI and GitHub PR-based interfaces imposed a steep learning curve, lacked standardization, were only vertically scalable, and created a fragmented user experience by forcing engineers to switch between multiple platforms.
How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Feature definition in UI
MLEs define features in the Feature Sources UI using SQL queries or Flink SQL/Scala code.
Griffin 2.0 replaced CLI and PR-based workflows with a unified web UI and REST API, enabled distributed training and LLM fine-tuning via Ray on Kubernetes, drastically reduced inference service setup time, and achieved substantial latency optimization for real-time inference.
What failed first
The MLFlow-based model registry in Griffin 1.0 could not handle the required query scalability, and the fire-and-forget CLI approach made it difficult to retrieve metadata or manage training-serving lineage for production deployments.