How Duolingo designs structured LLM-powered conversations for Video Call with Lily
Letting an LLM converse freely with language learners produces generic, off-character, off-level responses; Duolingo needed a structured pipeline to ensure every Video Call with Lily stays at the right CEFR level, matches Lily's established personality, and has a clear conversational purpose.
How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Learning designer writes system instructions
Duolingo Learning Designers write the instructions that the System gives to the Assistant (Lily) about how to act and what to say.
Tools used
ChatGPTClaudeGemini
Outcome
Duolingo built a structured multi-prompt pipeline for Video Call with Lily featuring separate first-question generation, persistent user memory via a List of Facts, and dynamic mid-call evaluation, enabling personalized, level-appropriate speaking practice.
What failed first
Two failure modes emerged during development: combining all instructions into one prompt overloaded the LLM and produced overly complex sentences or missing vocabulary; and without mid-call evaluation, Lily would ignore learner cues and stay on her pre-assigned topic regardless of what the learner wanted to discuss.
Results
Volumedelight and sass—and, of course, the opportunity for speaking practice