ecommerce_ops · ecommerce · workflow
Flipkart research: semi-supervised DPO fine-tuning of compact VLMs improves product attribute prediction accuracy from 75.1% to 85.7%
Manually labeling product attributes at e-commerce catalog scale is expensive and error-prone, large VLM APIs cost too much for production use, and a large pool of unlabeled product images remains underutilized.
How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Initial supervised fine-tuning
A pre-trained VLM is fine-tuned on a small, curated dataset labeled by a large multimodal model using Parameter-Efficient Fine-Tuning (PEFT).
Tools used
VLMsDPOPEFTQwen2.5-VL-3B-InstructGeminiGPT
Outcome
DPO-based semi-supervised fine-tuning improved accuracy from 75.1% to 85.7% on the tested compact VLM, and increasing unlabeled data volume steadily raised performance further.
What failed first
A Self-Learning approach that retrains the model on its own high-confidence predictions caused accuracy to drop in smaller VLMs due to model collapse, where the model reinforces its own biases.
Results
Volume75.1%
Grounding & classification
Source type: technical build writeup
23 fields verified against source quotes.
computer visiondata extractionproduct catalogfailure mode describedmetric backedsource backedtools describedworkflow describedecommerceaccuracy improvementcost reductiontechnical build writeupdata entry opsecommerce opsextract classify route