DoorDash ML Platform builds a computational graph system and Python DSL for flexible ensemble model production serving
DoorDash's ML platform could not combine rule-based logic and models from multiple ML frameworks into a single production-ready ensemble because each framework has its own serialization format and C++ runtime library, and data scientists would need to write custom C++ production code for every model.
The computational graph with C++ reduces CPU prediction time by more than a factor of 12 compared to Python and reduces total memory footprint from 120MB to 75MB. The Python DSL reduces model definition from 800 lines of JSON to 20 lines of Python, enabling data scientists to build complex ensemble models serving a peak throughput of three million predictions per second.
When this work began, PyTorch had limited support for serializing computation graphs with Python dependencies. A proof of concept using TorchScript achieved similar performance but required significant setup effort due to bugs and incomplete documentation, making it impractical as a general solution.
https://careersatdoordash.com/blog/computational-graph-machine-learning-ensemble-model-support/