quality_assurance · healthcare · workflow

Advanced fine-tuning for multi-agent orchestration: production results from Amazon Pharmacy, GES, and A+

Three Amazon teams faced high-stakes production challenges: Amazon Pharmacy dealt with medication direction errors costing up to $3.5 billion annually; Amazon GES faced lengthy expert-hour inspection reviews for hundreds of fulfillment centers; and Amazon A+ Content needed to evaluate content quality at massive scale across product submissions.

How it works

Common implementation structure

How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.

Stage 1 · High-stakes use case identified

One in four high-stakes applications demand advanced fine-tuning to achieve production-grade performance.

Tools used

RAGSFTPPODPOGRPODAPOGSPORLHFLoRARLVRRLAIFAmazon SageMakerAmazon BedrockAmazon Bedrock AgentCoreNova LiteAmazon SageMaker HyperPodAmazon SageMaker JumpStartStrandsAmazon Nova Forge

Outcome

Advanced fine-tuning delivered production-grade results across all three Amazon use cases: Amazon Pharmacy achieved a 33% reduction in near-miss medication events; Amazon GES achieved an 80% reduction in human expert effort; and Amazon A+ improved classification accuracy from 77% to 96%.

What failed first

Initial attempts using traditional RAG with foundation models at Amazon Pharmacy yielded disappointing results, with accuracy hovering between 60 and 70%, falling short of production requirements.

Results

Volume33%

Cost replaced$3.5 billion annually

Source

https://aws.amazon.com/blogs/machine-learning/advanced-fine-tuning-techniques-for-multi-agent-orchestration-patterns-from-amazon-at-scale?tag=soumet-20

How we source this →

Grounding & classification

Source type: technical build writeup

63 fields verified against source quotes, 1 dropped as unverifiable.

agentic workflowdocument classificationmulti agent workflowquality inspectionragknowledge basemedical recordfailure mode describedhuman review describedmetric backednamed customerproduction runtime claimedsource backedtools describedvendor confirmedworkflow describedecommercehealthcarelogisticssoftwareaccuracy improvementemployee productivityerror reductiontime savedtechnical build writeupback office opscompliance monitoringcustomer supportquality assuranceagentic task executionextract classify route