data_entry_ops · workflow

Dropbox uses machine learning and OCR to make text in billions of images searchable

Images and image-embedded PDFs stored by Dropbox users were invisible to search indexing because they contain only pixels rather than extractable text, leaving billions of files—including receipts, whiteboard photos, and scanned documents—unsearchable.

How it works

Common implementation structure

How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.

Stage 1 · File event ingestion

Cape ingests incoming file events such as adds or edits to kick off OCR processing.

Tools used

OCRCapePDFiumCaffeTensorFlow XLA

Outcome

Dropbox deployed automatic image text recognition for Professional and Business Advanced/Enterprise plan users, achieving a throughput improvement of about 3x through TensorFlow tuning and an 88% reduction in PDF metadata extraction failures, with almost 90% of documents indexed completely.

What failed first

An initial deployed pipeline version was computationally prohibitive—requiring an enormous cluster—and actual traffic was roughly twice the projected load; TensorFlow's default multicore behavior caused severe context-switching overhead that degraded throughput further.

Results

Volumealmost 90%

Source

https://dropbox.tech/machine-learning/using-machine-learning-to-index-text-from-billions-of-images

How we source this →

Grounding & classification

Source type: technical build writeup

26 fields verified against source quotes, 6 dropped as unverifiable.

computer visiondocument aidocument classificationenterprise searchocrreceiptbuilder submittedfailure mode describedproduction runtime claimedworkflow describedsoftwareautomation rateerror reductionthroughput increasetechnical build writeupback office opsdata entry opsdocument to recordextract classify route