data_entry_ops · workflow
Dropbox builds in-house deep learning OCR pipeline for mobile document scanner
Dropbox's commercial off-the-shelf OCR SDK was expensive (charged per scan) and tuned for flat-bed scanners, not mobile photos with crinkled documents, shadows, and uneven lighting; Dropbox also lacked control over future innovation.
How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Mobile document upload
Mobile clients upload scanned document images to an in-house asynchronous work queue.
Tools used
TensorFlowOpenCVTorchAmazon EC2 G2S3StormcrowDropTurkMechanical TurkLXCInception Resnet v2
Outcome
After about 8 months of research, productionization, and refinement, Dropbox deployed a state-of-the-art OCR pipeline to millions of Dropbox Business users, achieving mid-90s Single Word Accuracy and replacing the commercial SDK entirely.
What failed first
When the Word Detector and Word Deep Net were first chained end-to-end, accuracy dropped to around 44%—far below the competition—due to spacing errors and spurious garbage text from image noise.
Results
Time savedabout 8 months
Volumearound 79%
Cost replacedsignificant money
Grounding & classification
Source type: technical build writeup
32 fields verified against source quotes.
computer visiondocument aiocrinvoicereceiptfailure mode describedmetric backednamed customerproduction runtime claimedtools describedworkflow describedsoftwareaccuracy improvementcost reductiontechnical build writeupdata entry opsdocument to record