data_entry_ops · workflow

Dropbox builds ML-based document detection pipeline for iOS scanning

Dropbox needed accurate, fast document detection that could run on mobile devices. DNNs were too expensive in compute and memory for mobile deployment, and Apple's built-in rectangle detection SDK was insufficiently accurate for essential use-cases such as scanning small receipts or business cards in cluttered backgrounds.

How it works

Common implementation structure

How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.

Stage 1 · User scans document

A user scans a document with their smartphone camera and the feature automatically detects the document in the frame.

Tools used

OCRHough transform

Outcome

Dropbox's custom pipeline runs near-realtime at 8–10 frames per second, requires much less labeled training data than DNNs, and produces detections 60% less likely to need manual user correction than Apple's SDK — validated by an A/B test.

What failed first

DNNs were ruled out due to mobile compute and memory cost. Apple's rectangle detection SDK underperformed in complex scenes. The Canny edge detector was trialled but produced poor results because it amplified text inside the document while failing to clearly detect document edges.

Results

Volume60% less likely

Source

https://dropbox.tech/machine-learning/fast-and-accurate-document-detection-for-scanning

How we source this →

Grounding & classification

Source type: technical build writeup

18 fields verified against source quotes.

computer visiondocument aiocrreceiptfailure mode describedmetric backednamed customerproduction runtime claimedtools describedworkflow describedsoftwareaccuracy improvementtechnical build writeupdata entry opsdocument to record