marketing_ops · workflow

Duolingo replaces rule-based ad logic with BigQuery ML and XGBoost, driving tens of millions in annual revenue

Duolingo's ad decision logic had become too complex to optimize after years of A/B testing created a tangled web of rules fragmented across systems, making the codebase difficult to reason about, improve, or maintain without introducing bugs and tech debt.

How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · User completes learning session
After each learning session, a user is eligible to see either an in-house subscription ad or a network ad.
Tools used
dbtBigQuery MLXGBoostdbt_ml
Outcome

The ML model delivered millions of dollars in incremental annual revenue in its first few months, grew to tens of millions per year after refinement, and now drives roughly a quarter of Duolingo's year-over-year revenue growth.

What failed first

Two early ML iteration problems emerged: a dedicated holdout group for retraining introduced data drift by training on a different learner population than the one served by the model, and the initial model objective of predicting baseline subscription probability cannibalized revenue from other channels rather than growing it.

Results
Cost replacedmillions of dollars in incremental annual revenue
Source

https://blog.duolingo.com/machine-learning-ads/

How we source this →

Grounding & classification
Source type: technical build writeup
22 fields verified against source quotes.
personalizationpredictive analyticsrecommendation systemfailure mode describedmetric backednamed customerproduction runtime claimedtools describedworkflow describededucationrevenue increasethroughput increasetechnical build writeupmarketing opsdata sync enrichment