back_office_ops · workflow

Virgin Media O2 scales ML production with lean, isolated Vertex AI pipeline container environments

VMO2's MLOps platform used a single shared container environment for all pipeline tasks, which grew increasingly brittle as the number of data scientists and ML projects expanded — dependency conflicts blocked upgrades, the image ballooned in size slowing node start-up, and local installs exceeded 2 GB.

How it works

Common implementation structure

How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.

Stage 1 · Data scientist triggers ML pipeline

Data scientists and analysts explore data and iterate on ML-based solutions on the MLOps platform.

Tools used

Vertex AI PipelinesKubeflow Pipelines (KFP)BigQueryGCSXGBoostCatBoostscikit-learnTensorFlowDataflowpoetry

Outcome

Switching to lean, isolated container environments per component type reduced the MLOps package local install from over 2 GB to 147 MB, cut average pipeline running time by approximately 11%, and enabled the production ML model count to grow from 4 to 25 in seven months.

What failed first

The single monolithic container environment meant all pipeline tasks shared one image — upgrading any package risked breaking other users' projects, dependency conflicts worsened as the platform grew, and the ever-growing image increased download costs and slowed pipeline node start-up.

Results

Time saved~11%

Volumefrom four models to 25 in a matter of seven months

Source

https://mlops.community/blog/the-mlops-cookbook-how-we-optimised-our-vertex-ai-pipelines-environments-at-vmo2-for-scale

How we source this →

Grounding & classification

Source type: technical build writeup

27 fields verified against source quotes.

personalizationpredictive analyticsrecommendation systemfailure mode describedmetric backednamed customerproduction runtime claimedtools describedworkflow describedtelecomcycle time reductionemployee productivitythroughput increasetechnical build writeupback office ops