Canva builds synthetic data evaluation pipeline to improve private design search without accessing user data
Canva's engineers could only test private design search changes through a handful of manual queries on their own accounts due to strict privacy constraints preventing access to real user designs or queries, then had to wait days for online A/B experiments to validate changes.
How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Generate synthetic design content
GPT-4o is seeded with a realistic topic and sampled design type, prompted to brainstorm titles, then used again with a second prompt to generate corresponding text content.
Tools used
GPT-4oLLMsTestcontainersElasticSearchStreamlit
Outcome
The synthetic evaluation pipeline produces fully reproducible results on more than 1000 test cases in under 10 minutes, enabling more than 300 offline evaluations in the same time a single online experiment takes, all without accessing any real user data.
What failed first
Limited offline testing had low statistical power to catch poorly performing changes, and progressing quickly to online experiments risked exposing real users to degraded search behavior.
Results
Time savedmore than 1000 test cases in less than 10 minutes