Virgin Media O2 scales ML production with lean, isolated Vertex AI pipeline container environments
VMO2's MLOps platform used a single shared container environment for all pipeline tasks, which grew increasingly brittle as the number of data scientists and ML projects expanded — dependency conflicts blocked upgrades, the image ballooned in size slowing node start-up, and local installs exceeded 2 GB.
How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Data scientist triggers ML pipeline
Data scientists and analysts explore data and iterate on ML-based solutions on the MLOps platform.
Tools used
Vertex AI PipelinesKubeflow Pipelines (KFP)BigQueryGCSXGBoostCatBoostscikit-learnTensorFlowDataflowpoetry
Outcome
Switching to lean, isolated container environments per component type reduced the MLOps package local install from over 2 GB to 147 MB, cut average pipeline running time by approximately 11%, and enabled the production ML model count to grow from 4 to 25 in seven months.
What failed first
The single monolithic container environment meant all pipeline tasks shared one image — upgrading any package risked breaking other users' projects, dependency conflicts worsened as the platform grew, and the ever-growing image increased download costs and slowed pipeline node start-up.
Results
Time saved~11%
Volumefrom four models to 25 in a matter of seven months