data_entry_ops · workflow

Dropbox implements augmented camera preview for Android document scanner using ML-based edge detection

Implementing real-time document edge detection on the fragmented Android hardware landscape required processing each camera frame within 80ms — a constraint that the standard NV21-to-RGBA conversion method, taking 300-500ms per frame, could not meet.

How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Camera frame arrives
A new camera frame arrives 20-30 times per second and triggers the processing callback.
Tools used
RenderScriptScriptIntrinsicResizeScriptIntrinsicYuvtoRGBJNITextureView
Outcome

Using RenderScript intrinsics reduced frame conversion to 10-25ms; the TextureView approach achieved 5-15ms for the 200x200 input; the complete system achieves at least 15 FPS on most Android devices.

What failed first

Converting NV21 camera frames to RGBA bitmaps using the standard Java method took 300-500ms per 1920x1080 frame, making it unacceptable for a pipeline requiring under 80ms per cycle.

Results
Time saved300-500 ms
Volume20-100ms
Source

https://dropbox.tech/machine-learning/augmented-camera-previews-for-the-dropbox-android-document-scanner

How we source this →

Grounding & classification
Source type: technical build writeup
23 fields verified against source quotes, 1 dropped as unverifiable.
computer visiondocument aifailure mode describedmetric backedproduction runtime claimedtools describedworkflow describedsoftwarecycle time reductionthroughput increasetechnical build writeupdata entry opsdocument to record