back_office_ops · workflow

Recursion operationalizes petabyte-scale deep learning for drug discovery with a custom MLOps pipeline

Drug discovery is exceptionally time-consuming, costly, and has a 90% clinical trial failure rate. Recursion needed to build target-agnostic models that generalize across diseases while managing petabytes of imaging data and overcoming the lack of adequately-labeled biological data.

How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Cell image data collection
Robotic labs continuously generate a high-throughput screening dataset of cell images using the Cell Painting assay with fluorescent dyes.
Tools used
CellProfilerDetermined AICodefreshMLFlowGoogle Container RepositoryGoogle Cloud StorageGoogle Kubernetes EngineDockerCell Painting
Outcome

Recursion built a scalable MLOps pipeline enabling ML scientists to train hundreds of DL model variants per month and run inference on hundreds of millions of images, supporting three drug candidates in Phase 2 clinical trials and dozens more in earlier stages.

Results
Time saved14 years
Volume90%
Running since2021
Source

https://mlops.community/blog/drug-discovery-with-deep-learning-at-recursion

How we source this →

Grounding & classification
Source type: technical build writeup
31 fields verified against source quotes, 1 dropped as unverifiable.
computer visionpredictive analyticsbuilder submittedhuman review describedmetric backednamed customerproduction runtime claimedtools describedworkflow describedpharma life sciencesemployee productivitythroughput increasetechnical build writeupback office opsextract classify route