ecommerce_ops · ecommerce · workflow

Shopify fine-tunes a tool-calling agent for Flow: 2.2x faster, 68% cheaper, outperforms closed models

Store owners who are not engineers found building automation workflows from a blank canvas in Shopify Flow daunting. The feature also faced a cold start problem: no production conversations existed to learn from because Sidekick had not yet been deployed.

How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Sample production workflows
Thousands of anonymized store-owner workflows are sampled from production and filtered for quality to bootstrap training data.
Tools used
Qwen3-32BH200 GPUsFSDPTangleCometMLHuggingFaceCentMLSidekick
Outcome

The fine-tuned model is 2.2x faster and 68% cheaper than the closed-model baseline, outperforms closed models, and now serves the majority of production traffic, with a continuous weekly retraining flywheel that closes quality gaps identified in production.

What failed first

Offline benchmark results showed parity with the prompt-based agent, but initial production deployment revealed the fine-tuned model had a 35% lower workflow activation rate because synthetic training data did not cover real user requests such as editing existing workflows, handling email configurations, and working with third-party integrations.

Results
Volume2.2x faster
Cost replaced68% cheaper
Source

https://shopify.engineering/fine-tuning-agent-shopify-flow

How we source this →

Grounding & classification
Source type: technical build writeup
35 fields verified against source quotes.
agentic workflowai agentcontent generationknowledge basefailure mode describedhuman review describedmetric backednamed customerproduction runtime claimedtools describedworkflow describedecommercesoftwareaccuracy improvementcost reductioncycle time reductionemployee productivitytechnical build writeupecommerce opsagentic task executionautonomous resolution