data_entry_ops · workflow

Dropbox implements augmented camera preview for Android document scanner using ML-based edge detection

Implementing real-time document edge detection on the fragmented Android hardware landscape required processing each camera frame within 80ms — a constraint that the standard NV21-to-RGBA conversion method, taking 300-500ms per frame, could not meet.

How it works

Common implementation structure

How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.

Stage 1 · Camera frame arrives

A new camera frame arrives 20-30 times per second and triggers the processing callback.

Tools used

RenderScriptScriptIntrinsicResizeScriptIntrinsicYuvtoRGBJNITextureView

Outcome

Using RenderScript intrinsics reduced frame conversion to 10-25ms; the TextureView approach achieved 5-15ms for the 200x200 input; the complete system achieves at least 15 FPS on most Android devices.

What failed first

Converting NV21 camera frames to RGBA bitmaps using the standard Java method took 300-500ms per 1920x1080 frame, making it unacceptable for a pipeline requiring under 80ms per cycle.

Results

Time saved300-500 ms

Volume20-100ms

Source

https://dropbox.tech/machine-learning/augmented-camera-previews-for-the-dropbox-android-document-scanner

How we source this →

Grounding & classification

Source type: technical build writeup

23 fields verified against source quotes, 1 dropped as unverifiable.

computer visiondocument aifailure mode describedmetric backedproduction runtime claimedtools describedworkflow describedsoftwarecycle time reductionthroughput increasetechnical build writeupdata entry opsdocument to record