back_office_ops · workflow

Canva's distributed ML hyperparameter optimization with Argo Workflows and Bayesian optimization

Canva's ML teams faced exponentially growing hyperparameter search spaces and severe resource constraints from vertical scaling, making tuning large models impractically slow.

How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Engineer configures search space
Machine learning engineers specify their desired search spaces and hyperparameter configurations at run-time via Argo CLI and UI.
Tools used
Kubernetes
Outcome

The distributed HPO system delivered an average speedup of at least five times over the previous process, cutting optimization time from over a week to a little over a day.

Results
Time savedfrom over a week to a little over a day
Volumeat least five times
Source

https://www.canva.dev/blog/engineering/machine-learning-hyperparameter-optimization-with-argo/

How we source this →

Grounding & classification
Source type: technical build writeup
12 fields verified against source quotes, 3 dropped as unverifiable.
predictive analyticsmetric backednamed customerproduction runtime claimedworkflow describedsoftwarecycle time reductionemployee productivitytechnical build writeupback office opsagentic task execution