incident_management · workflow

Stripe uses ML and time-series anomaly detection to monitor payment performance across 16,000+ slice dimensions

Aggregate payment monitoring masked degradations affecting specific traffic segments—a spike in failures for a particular card type or region might not move global metrics even as individual businesses felt acute impact.

How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Continuous slice monitoring
Payment transactions are continuously monitored across a high-dimensional space of over 16,000 payment-related variables.
Tools used
ML modelstime series algorithmsfinite state machine
Outcome

The slice monitoring platform identifies real payment performance degradations each day with precision exceeding 90%, achieving excellent coverage without generating unsustainable operational burden from false positives.

What failed first

Standard time-series anomaly detection was insufficient because payment metrics lack a stable baseline—customer onboarding, fraud trends, and business behavior changes create underlying variation that would cause false positives.

Results
Volumeexceeding 90%
Cost replacedmore than $31 billion
Source

https://stripe.com/blog/using-ml-to-detect-and-respond-to-performance-degradations-in-slices-of-stripe-payments

How we source this →

Grounding & classification
Source type: technical build writeup
21 fields verified against source quotes.
anomaly detectionpredictive analyticsfailure mode describedhuman review describedmetric backednamed customerproduction runtime claimedtools describedworkflow describedfinancial servicesaccuracy improvementerror reductiontechnical build writeupcompliance monitoringincident managementmonitor detect alert