incident_management · workflow

Stripe uses ML and time-series anomaly detection to monitor payment performance across 16,000+ slice dimensions

Aggregate payment monitoring masked degradations affecting specific traffic segments—a spike in failures for a particular card type or region might not move global metrics even as individual businesses felt acute impact.

How it works

Common implementation structure

How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.

Stage 1 · Continuous slice monitoring

Payment transactions are continuously monitored across a high-dimensional space of over 16,000 payment-related variables.

Tools used

ML modelstime series algorithmsfinite state machine

Outcome

The slice monitoring platform identifies real payment performance degradations each day with precision exceeding 90%, achieving excellent coverage without generating unsustainable operational burden from false positives.

What failed first

Standard time-series anomaly detection was insufficient because payment metrics lack a stable baseline—customer onboarding, fraud trends, and business behavior changes create underlying variation that would cause false positives.

Results

Volumeexceeding 90%

Cost replacedmore than $31 billion

Source

https://stripe.com/blog/using-ml-to-detect-and-respond-to-performance-degradations-in-slices-of-stripe-payments

How we source this →

Grounding & classification

Source type: technical build writeup

21 fields verified against source quotes.

anomaly detectionpredictive analyticsfailure mode describedhuman review describedmetric backednamed customerproduction runtime claimedtools describedworkflow describedfinancial servicesaccuracy improvementerror reductiontechnical build writeupcompliance monitoringincident managementmonitor detect alert