finance_ops · workflow

Zalando rebuilds ML pipeline for payment-default fraud detection on Amazon SageMaker

Zalando's second-generation Scala/Spark ML pipeline for detecting payment defaults was tightly coupled to a single framework making modern Python libraries difficult to adopt, relied on custom code that added maintenance burden, suffered from memory issues and latency spikes with slow instance startup, and had a monolithic design that fused feature preprocessing with model training into a single cluster.

How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Training data preprocessing
Training data is preprocessed using a Databricks cluster and a scikit-learn batch transform job on SageMaker.
Tools used
Amazon SageMakerzflowAWS Step FunctionsAWS LambdasDatabricksscikit-learnXGBoostPyTorchTensorflow
Outcome

The new SageMaker-based pipeline is framework-independent with clear separation between preprocessing and training, and reduced scale-up time by 50%. Load tests confirm a single ml.m5.large instance handles 200 requests/second with p99 latency under 80ms.

What failed first

An original Python/scikit-learn ML setup was replaced in 2015 by a Scala/Spark system to scale better, but this second-generation system accumulated its own technical pain points that necessitated a third migration.

Results
Time saved50%
Volume99.9%
Cost replacedup to 200%
Source

https://engineering.zalando.com/posts/2021/02/machine-learning-pipeline-with-real-time-inference.html

How we source this →

Grounding & classification
Source type: technical build writeup
28 fields verified against source quotes.
fraud detectionpredictive analyticsfailure mode describedmetric backednamed customerproduction runtime claimedtools describedworkflow describedecommercecycle time reductionthroughput increasetechnical build writeupecommerce opsfinance opsextract classify route