order_processing · workflow

Zalando migrates real-time fraud detection from Python/scikit-learn to Scala/Spark for platform scale

Zalando's Python-based fraud detection system could not scale to the demands of its expanding fashion platform: the Python GIL blocked multithreading for concurrent predictions, training data exhausted in-house cluster memory, JSON configuration became unmanageable, and shared cluster resources created bottlenecks.

How it works

Common implementation structure

How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.

Stage 1 · Order received as JSON

Order data arrives in the form of a JSON request to the prediction service.

Tools used

scikit-learnCherryPyPlay frameworkMLlib

Outcome

The new Scala and Spark system on AWS reduced overall learning time by a factor of two, cut prediction response time at 20 concurrent requests from ~1000 ms to ~70 ms, and a sparse feature condenser improved prediction accuracy by more than 25% while more than halving runtime.

What failed first

The original Python system using CherryPy for serving requests and scikit-learn for ML on a static in-house cluster failed to scale: Python's GIL prevented concurrent predictions, cluster memory capped training data size, and growing JSON config complexity blocked safe refactoring.

Results

Time saveddrops by a factor of two

Volumemore than 25% improvement

Source

https://engineering.zalando.com/posts/2016/05/scalable-fraud-detection-fashion-platform.html

How we source this →

Grounding & classification

Source type: technical build writeup

28 fields verified against source quotes, 1 dropped as unverifiable.

fraud detectionpredictive analyticspurchase orderfailure mode describedmetric backednamed customerproduction runtime claimedsource backedtools describedworkflow describedecommerceaccuracy improvementcycle time reductionresponse time reductionthroughput increasetechnical build writeupfinance opsorder processingextract classify routemonitor detect alert