Advanced fine-tuning for multi-agent orchestration: production results from Amazon Pharmacy, GES, and A+
Three Amazon teams faced high-stakes production challenges: Amazon Pharmacy dealt with medication direction errors costing up to $3.5 billion annually; Amazon GES faced lengthy expert-hour inspection reviews for hundreds of fulfillment centers; and Amazon A+ Content needed to evaluate content quality at massive scale across product submissions.
How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · High-stakes use case identified
One in four high-stakes applications demand advanced fine-tuning to achieve production-grade performance.
Advanced fine-tuning delivered production-grade results across all three Amazon use cases: Amazon Pharmacy achieved a 33% reduction in near-miss medication events; Amazon GES achieved an 80% reduction in human expert effort; and Amazon A+ improved classification accuracy from 77% to 96%.
What failed first
Initial attempts using traditional RAG with foundation models at Amazon Pharmacy yielded disappointing results, with accuracy hovering between 60 and 70%, falling short of production requirements.