ecommerce_ops · ecommerce · workflow

Flipkart research: semi-supervised DPO fine-tuning of compact VLMs improves product attribute prediction accuracy from 75.1% to 85.7%

Manually labeling product attributes at e-commerce catalog scale is expensive and error-prone, large VLM APIs cost too much for production use, and a large pool of unlabeled product images remains underutilized.

How it works

Common implementation structure

How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.

Stage 1 · Initial supervised fine-tuning

A pre-trained VLM is fine-tuned on a small, curated dataset labeled by a large multimodal model using Parameter-Efficient Fine-Tuning (PEFT).

Tools used

VLMsDPOPEFTQwen2.5-VL-3B-InstructGeminiGPT

Outcome

DPO-based semi-supervised fine-tuning improved accuracy from 75.1% to 85.7% on the tested compact VLM, and increasing unlabeled data volume steadily raised performance further.

What failed first

A Self-Learning approach that retrains the model on its own high-confidence predictions caused accuracy to drop in smaller VLMs due to model collapse, where the model reinforces its own biases.

Results

Volume75.1%

Source

https://blog.flipkart.tech/the-future-of-e-commerce-how-ai-is-learning-to-describe-products-with-less-data-8dfbf05f83a1

How we source this →

Grounding & classification

Source type: technical build writeup

23 fields verified against source quotes.

computer visiondata extractionproduct catalogfailure mode describedmetric backedsource backedtools describedworkflow describedecommerceaccuracy improvementcost reductiontechnical build writeupdata entry opsecommerce opsextract classify route