Shop the Look: Zalando's deep learning visual search pipeline for fashion product retrieval
Users want to find fashion products they see in photos, but words alone are insufficient to describe fashion items. Visual search for fashion poses challenges around image quality, lighting, varied backgrounds, human poses, and article distortion at scale.
How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · User submits query image
A customer submits a query fashion image through the app or Facebook chatbot.
Tools used
FashionDNAStudio2ShopStreet2FashionFashion2Shop
Outcome
Zalando developed the Street2Fashion2Shop pipeline combining background segmentation and product matching to handle real-world query images. The segmentation results were described as good enough to focus on the fashion in the image. The work remained at research stage at time of publication; production visual search was powered by Fashwell.
What failed first
Studio2Shop was trained exclusively on clean-background studio images, making it unsuitable for natural real-world photos. A direct extension to natural images was blocked by the absence of annotated natural fashion image data.