back_office_ops · media · workflow

Roblox builds hybrid cloud ML inference infrastructure scaling to 250+ AI pipelines in three phases

The lack of a unified Roblox AI platform caused engineering teams to build fragmented mini-platforms with disparate frameworks, each constructing custom feature engineering, optimizations, and inference scaling solutions independently without central support.

How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Fragmented teams trigger platform need
The lack of a unified AI platform led engineering teams to construct their own mini-platforms and select disparate frameworks independently.
Tools used
KubeflowJupyterKServeTriton Inference ServerRayFeastFlinkvLLMvector databaseCLIPRoblox Assistant
Outcome

Roblox scaled from fewer than 50 ML inference pipelines to approximately 250, with vLLM delivering an almost 2x improvement in latency and throughput and currently serving approximately 4 billion tokens per week; Avatar Auto Setup produced approximately 8% of UGC avatar bodies as of August 2024.

What failed first

The initial offline inference setup was designed only for real-time sequential workloads, lacked support for task parallelism and multistage processing, and required engineers to write their own data chunking and error-handling logic as inference needs scaled.

Results
Time saved1.5 billion per week
Volumefewer than 50
Cost replacedsignificantly reduced
Running sincelate 2021
Source

https://corp.roblox.com/newsroom/2024/09/running-ai-inference-at-scale-in-the-hybrid-cloud

How we source this →

Grounding & classification
Source type: technical build writeup
49 fields verified against source quotes, 1 dropped as unverifiable.
computer visioncontent generationrecommendation systemtranslationvoice aiknowledge basefailure mode describedhuman review describedmetric backednamed customerproduction runtime claimedtools describedworkflow describedmediasoftwarecost reductionemployee productivityresponse time reductionthroughput increasetechnical build writeupback office opsquality assurancehuman review queuemonitor detect alert