quality_assurance · saas · workflow

Pushpay builds production-ready agentic AI search on Amazon Bedrock, improving accuracy from 60–70% to 95%

Ministry leaders at Pushpay's church customers needed fast access to community insights without technical expertise, but the initial AI search agent plateaued at 60–70% accuracy because evaluation was manual and tedious, creating critical blockers to production deployment.

How it works

Common implementation structure

How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.

Stage 1 · User submits natural language query

Pushpay users submit natural language queries through the existing Pushpay application interface.

Tools used

Amazon BedrockClaude Sonnet 4.5Amazon Bedrock prompt cachingDynamic prompt constructor

Outcome

Pushpay's generative AI evaluation framework raised agent accuracy from 60–70% to 95% through domain-level dashboards and strategic rollout, while reducing time-to-insight from approximately 120 seconds to under 4 seconds—a 15-fold acceleration.

What failed first

The first AI search agent iteration relied on a single statically tuned system prompt and had no automated evaluation mechanism, causing it to stall at a 60–70% accuracy ceiling with no clear path to improvement.

Results

Time savedfrom approximately 120 seconds to under 4 seconds

Volume60-70%

Source

https://aws.amazon.com/blogs/machine-learning/build-reliable-agentic-ai-solution-with-amazon-bedrock-learn-from-pushpays-journey-on-genai-evaluation?tag=soumet-20

How we source this →

Grounding & classification

Source type: platform led case

30 fields verified against source quotes.

agentic workflowconversational aidata extractionquality inspectionknowledge basefailure mode describedhuman review describedmetric backednamed customerproduction runtime claimedtools describedworkflow describednonprofitsoftwareaccuracy improvementcycle time reductionemployee productivityplatform led caseback office opsquality assuranceagentic task executionextract classify route