data_entry_ops · workflow

Dropbox builds in-house deep learning OCR pipeline for mobile document scanner

Dropbox's commercial off-the-shelf OCR SDK was expensive (charged per scan) and tuned for flat-bed scanners, not mobile photos with crinkled documents, shadows, and uneven lighting; Dropbox also lacked control over future innovation.

How it works

Common implementation structure

How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.

Stage 1 · Mobile document upload

Mobile clients upload scanned document images to an in-house asynchronous work queue.

Tools used

TensorFlowOpenCVTorchAmazon EC2 G2S3StormcrowDropTurkMechanical TurkLXCInception Resnet v2

Outcome

After about 8 months of research, productionization, and refinement, Dropbox deployed a state-of-the-art OCR pipeline to millions of Dropbox Business users, achieving mid-90s Single Word Accuracy and replacing the commercial SDK entirely.

What failed first

When the Word Detector and Word Deep Net were first chained end-to-end, accuracy dropped to around 44%—far below the competition—due to spacing errors and spurious garbage text from image noise.

Results

Time savedabout 8 months

Volumearound 79%

Cost replacedsignificant money

Source

https://dropbox.tech/machine-learning/creating-a-modern-ocr-pipeline-using-computer-vision-and-deep-learning

How we source this →

Grounding & classification

Source type: technical build writeup

32 fields verified against source quotes.

computer visiondocument aiocrinvoicereceiptfailure mode describedmetric backednamed customerproduction runtime claimedtools describedworkflow describedsoftwareaccuracy improvementcost reductiontechnical build writeupdata entry opsdocument to record