data_entry_ops · workflow

Dropbox builds in-house deep learning OCR pipeline for mobile document scanner

Dropbox's commercial off-the-shelf OCR SDK was expensive (charged per scan) and tuned for flat-bed scanners, not mobile photos with crinkled documents, shadows, and uneven lighting; Dropbox also lacked control over future innovation.

How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Mobile document upload
Mobile clients upload scanned document images to an in-house asynchronous work queue.
Tools used
TensorFlowOpenCVTorchAmazon EC2 G2S3StormcrowDropTurkMechanical TurkLXCInception Resnet v2
Outcome

After about 8 months of research, productionization, and refinement, Dropbox deployed a state-of-the-art OCR pipeline to millions of Dropbox Business users, achieving mid-90s Single Word Accuracy and replacing the commercial SDK entirely.

What failed first

When the Word Detector and Word Deep Net were first chained end-to-end, accuracy dropped to around 44%—far below the competition—due to spacing errors and spurious garbage text from image noise.

Results
Time savedabout 8 months
Volumearound 79%
Cost replacedsignificant money
Source

https://dropbox.tech/machine-learning/creating-a-modern-ocr-pipeline-using-computer-vision-and-deep-learning

How we source this →

Grounding & classification
Source type: technical build writeup
32 fields verified against source quotes.
computer visiondocument aiocrinvoicereceiptfailure mode describedmetric backednamed customerproduction runtime claimedtools describedworkflow describedsoftwareaccuracy improvementcost reductiontechnical build writeupdata entry opsdocument to record