data_entry_ops · workflow

Dropbox builds ML-based document detection pipeline for iOS scanning

Dropbox needed accurate, fast document detection that could run on mobile devices. DNNs were too expensive in compute and memory for mobile deployment, and Apple's built-in rectangle detection SDK was insufficiently accurate for essential use-cases such as scanning small receipts or business cards in cluttered backgrounds.

How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · User scans document
A user scans a document with their smartphone camera and the feature automatically detects the document in the frame.
Tools used
OCRHough transform
Outcome

Dropbox's custom pipeline runs near-realtime at 8–10 frames per second, requires much less labeled training data than DNNs, and produces detections 60% less likely to need manual user correction than Apple's SDK — validated by an A/B test.

What failed first

DNNs were ruled out due to mobile compute and memory cost. Apple's rectangle detection SDK underperformed in complex scenes. The Canny edge detector was trialled but produced poor results because it amplified text inside the document while failing to clearly detect document edges.

Results
Volume60% less likely
Source

https://dropbox.tech/machine-learning/fast-and-accurate-document-detection-for-scanning

How we source this →

Grounding & classification
Source type: technical build writeup
18 fields verified against source quotes.
computer visiondocument aiocrreceiptfailure mode describedmetric backednamed customerproduction runtime claimedtools describedworkflow describedsoftwareaccuracy improvementtechnical build writeupdata entry opsdocument to record