logistics_ops · workflow

DoorDash ML Platform builds a computational graph system and Python DSL for flexible ensemble model production serving

DoorDash's ML platform could not combine rule-based logic and models from multiple ML frameworks into a single production-ready ensemble because each framework has its own serialization format and C++ runtime library, and data scientists would need to write custom C++ production code for every model.

How it works

Common implementation structure

How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.

Stage 1 · Define ensemble model with Python DSL

Data scientists define a static computation graph using the Python DSL to specify the ensemble model structure.

Tools used

LightGBMPyTorchSibyl Prediction ServicegRPCKubernetesKotlinxtensorJupyterGitDockerJNI

Outcome

The computational graph with C++ reduces CPU prediction time by more than a factor of 12 compared to Python and reduces total memory footprint from 120MB to 75MB. The Python DSL reduces model definition from 800 lines of JSON to 20 lines of Python, enabling data scientists to build complex ensemble models serving a peak throughput of three million predictions per second.

What failed first

When this work began, PyTorch had limited support for serializing computation graphs with Python dependencies. A proof of concept using TorchScript achieved similar performance but required significant setup effort due to bugs and incomplete documentation, making it impractical as a general solution.

Results

Time savedmore than a factor of 12

Volume120MB

Source

https://careersatdoordash.com/blog/computational-graph-machine-learning-ensemble-model-support/

How we source this →

Grounding & classification

Source type: technical build writeup

35 fields verified against source quotes, 1 dropped as unverifiable.

forecastingpredictive analyticsrecommendation systembuilder submittedfailure mode describedmetric backednamed customerproduction runtime claimedtools describedworkflow describedlogisticscost reductioncycle time reductionemployee productivitythroughput increasetechnical build writeupecommerce opslogistics ops