back_office_ops · workflow
Recursion operationalizes petabyte-scale deep learning for drug discovery with a custom MLOps pipeline
Drug discovery is exceptionally time-consuming, costly, and has a 90% clinical trial failure rate. Recursion needed to build target-agnostic models that generalize across diseases while managing petabytes of imaging data and overcoming the lack of adequately-labeled biological data.
How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Cell image data collection
Robotic labs continuously generate a high-throughput screening dataset of cell images using the Cell Painting assay with fluorescent dyes.
Tools used
CellProfilerDetermined AICodefreshMLFlowGoogle Container RepositoryGoogle Cloud StorageGoogle Kubernetes EngineDockerCell Painting
Outcome
Recursion built a scalable MLOps pipeline enabling ML scientists to train hundreds of DL model variants per month and run inference on hundreds of millions of images, supporting three drug candidates in Phase 2 clinical trials and dozens more in earlier stages.
Results
Time saved14 years
Volume90%
Running since2021
Grounding & classification
Source type: technical build writeup
31 fields verified against source quotes, 1 dropped as unverifiable.
computer visionpredictive analyticsbuilder submittedhuman review describedmetric backednamed customerproduction runtime claimedtools describedworkflow describedpharma life sciencesemployee productivitythroughput increasetechnical build writeupback office opsextract classify route