compliance_monitoring · finance · workflow

SumUp uses an LLM-driven evaluator to assess AML financial crime report generation

SumUp's risk and compliance agents had to manually write repetitive financial crime reports for AML escalations, and standard NLP metrics could not adequately evaluate whether LLM-generated narratives were accurate and complete.

How it works

Common implementation structure

How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.

Stage 1 · Agent confirms suspicious activity

A risk or compliance agent confirms that an account has suspicious activity before escalating.

Tools used

LLMRouge Score

Outcome

The LLM-driven evaluator consistently differentiated between good and poor narratives and correlated closely with human agent assessments, enabling data scientists to test model improvements without adding extra workload to compliance agents.

What failed first

The Rouge Score metric showed minimal differences between accurately and inaccurately generated narratives, making it unreliable for distinguishing report quality in the AML context.

Results

Time savedsave a substantial amount of time

Volume4.67

Source

https://medium.com/inside-sumup/evaluating-the-performance-of-an-llm-application-that-generates-free-text-narratives-in-the-context-c402a0136518

How we source this →

Grounding & classification

Source type: technical build writeup

19 fields verified against source quotes.

content generationdocument aifailure mode describedhuman review describednamed customerproduction runtime claimedtools describedworkflow describedfinancial servicesemployee productivitytime savedtechnical build writeupcompliance monitoringkyc amlregulatory reportingai draft human approval