quality_assurance · saas · workflow
Pushpay builds production-ready agentic AI search on Amazon Bedrock, improving accuracy from 60–70% to 95%
Ministry leaders at Pushpay's church customers needed fast access to community insights without technical expertise, but the initial AI search agent plateaued at 60–70% accuracy because evaluation was manual and tedious, creating critical blockers to production deployment.
How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · User submits natural language query
Pushpay users submit natural language queries through the existing Pushpay application interface.
Tools used
Amazon BedrockClaude Sonnet 4.5Amazon Bedrock prompt cachingDynamic prompt constructor
Outcome
Pushpay's generative AI evaluation framework raised agent accuracy from 60–70% to 95% through domain-level dashboards and strategic rollout, while reducing time-to-insight from approximately 120 seconds to under 4 seconds—a 15-fold acceleration.
What failed first
The first AI search agent iteration relied on a single statically tuned system prompt and had no automated evaluation mechanism, causing it to stall at a 60–70% accuracy ceiling with no clear path to improvement.
Results
Time savedfrom approximately 120 seconds to under 4 seconds
Volume60-70%
Grounding & classification
Source type: platform led case
30 fields verified against source quotes.
agentic workflowconversational aidata extractionquality inspectionknowledge basefailure mode describedhuman review describedmetric backednamed customerproduction runtime claimedtools describedworkflow describednonprofitsoftwareaccuracy improvementcycle time reductionemployee productivityplatform led caseback office opsquality assuranceagentic task executionextract classify route