Shopify fine-tunes a tool-calling agent for Flow: 2.2x faster, 68% cheaper, outperforms closed models
Store owners who are not engineers found building automation workflows from a blank canvas in Shopify Flow daunting. The feature also faced a cold start problem: no production conversations existed to learn from because Sidekick had not yet been deployed.
How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Sample production workflows
Thousands of anonymized store-owner workflows are sampled from production and filtered for quality to bootstrap training data.
The fine-tuned model is 2.2x faster and 68% cheaper than the closed-model baseline, outperforms closed models, and now serves the majority of production traffic, with a continuous weekly retraining flywheel that closes quality gaps identified in production.
What failed first
Offline benchmark results showed parity with the prompt-based agent, but initial production deployment revealed the fine-tuned model had a 35% lower workflow activation rate because synthetic training data did not cover real user requests such as editing existing workflows, handling email configurations, and working with third-party integrations.