compliance_monitoring · finance · workflow

SumUp uses an LLM-driven evaluator to assess AML financial crime report generation

SumUp's risk and compliance agents had to manually write repetitive financial crime reports for AML escalations, and standard NLP metrics could not adequately evaluate whether LLM-generated narratives were accurate and complete.

How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Agent confirms suspicious activity
A risk or compliance agent confirms that an account has suspicious activity before escalating.
Tools used
LLMRouge Score
Outcome

The LLM-driven evaluator consistently differentiated between good and poor narratives and correlated closely with human agent assessments, enabling data scientists to test model improvements without adding extra workload to compliance agents.

What failed first

The Rouge Score metric showed minimal differences between accurately and inaccurately generated narratives, making it unreliable for distinguishing report quality in the AML context.

Results
Time savedsave a substantial amount of time
Volume4.67
Source

https://medium.com/inside-sumup/evaluating-the-performance-of-an-llm-application-that-generates-free-text-narratives-in-the-context-c402a0136518

How we source this →

Grounding & classification
Source type: technical build writeup
19 fields verified against source quotes.
content generationdocument aifailure mode describedhuman review describednamed customerproduction runtime claimedtools describedworkflow describedfinancial servicesemployee productivitytime savedtechnical build writeupcompliance monitoringkyc amlregulatory reportingai draft human approval