quality_assurance · healthcare · workflow

Advanced fine-tuning for multi-agent orchestration: production results from Amazon Pharmacy, GES, and A+

Three Amazon teams faced high-stakes production challenges: Amazon Pharmacy dealt with medication direction errors costing up to $3.5 billion annually; Amazon GES faced lengthy expert-hour inspection reviews for hundreds of fulfillment centers; and Amazon A+ Content needed to evaluate content quality at massive scale across product submissions.

How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · High-stakes use case identified
One in four high-stakes applications demand advanced fine-tuning to achieve production-grade performance.
Tools used
RAGSFTPPODPOGRPODAPOGSPORLHFLoRARLVRRLAIFAmazon SageMakerAmazon BedrockAmazon Bedrock AgentCoreNova LiteAmazon SageMaker HyperPodAmazon SageMaker JumpStartStrandsAmazon Nova Forge
Outcome

Advanced fine-tuning delivered production-grade results across all three Amazon use cases: Amazon Pharmacy achieved a 33% reduction in near-miss medication events; Amazon GES achieved an 80% reduction in human expert effort; and Amazon A+ improved classification accuracy from 77% to 96%.

What failed first

Initial attempts using traditional RAG with foundation models at Amazon Pharmacy yielded disappointing results, with accuracy hovering between 60 and 70%, falling short of production requirements.

Results
Volume33%
Cost replaced$3.5 billion annually
Source

https://aws.amazon.com/blogs/machine-learning/advanced-fine-tuning-techniques-for-multi-agent-orchestration-patterns-from-amazon-at-scale?tag=soumet-20

How we source this →

Grounding & classification
Source type: technical build writeup
63 fields verified against source quotes, 1 dropped as unverifiable.
agentic workflowdocument classificationmulti agent workflowquality inspectionragknowledge basemedical recordfailure mode describedhuman review describedmetric backednamed customerproduction runtime claimedsource backedtools describedvendor confirmedworkflow describedecommercehealthcarelogisticssoftwareaccuracy improvementemployee productivityerror reductiontime savedtechnical build writeupback office opscompliance monitoringcustomer supportquality assuranceagentic task executionextract classify route