Zalando rebuilds ML pipeline for payment-default fraud detection on Amazon SageMaker
Zalando's second-generation Scala/Spark ML pipeline for detecting payment defaults was tightly coupled to a single framework making modern Python libraries difficult to adopt, relied on custom code that added maintenance burden, suffered from memory issues and latency spikes with slow instance startup, and had a monolithic design that fused feature preprocessing with model training into a single cluster.
How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Training data preprocessing
Training data is preprocessed using a Databricks cluster and a scikit-learn batch transform job on SageMaker.
The new SageMaker-based pipeline is framework-independent with clear separation between preprocessing and training, and reduced scale-up time by 50%. Load tests confirm a single ml.m5.large instance handles 200 requests/second with p99 latency under 80ms.
What failed first
An original Python/scikit-learn ML setup was replaced in 2015 by a Scala/Spark system to scale better, but this second-generation system accumulated its own technical pain points that necessitated a third migration.