back_office_ops · media · workflow

Roblox builds hybrid cloud ML inference infrastructure scaling to 250+ AI pipelines in three phases

The lack of a unified Roblox AI platform caused engineering teams to build fragmented mini-platforms with disparate frameworks, each constructing custom feature engineering, optimizations, and inference scaling solutions independently without central support.

How it works

Common implementation structure

How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.

Stage 1 · Fragmented teams trigger platform need

The lack of a unified AI platform led engineering teams to construct their own mini-platforms and select disparate frameworks independently.

Tools used

KubeflowJupyterKServeTriton Inference ServerRayFeastFlinkvLLMvector databaseCLIPRoblox Assistant

Outcome

Roblox scaled from fewer than 50 ML inference pipelines to approximately 250, with vLLM delivering an almost 2x improvement in latency and throughput and currently serving approximately 4 billion tokens per week; Avatar Auto Setup produced approximately 8% of UGC avatar bodies as of August 2024.

What failed first

The initial offline inference setup was designed only for real-time sequential workloads, lacked support for task parallelism and multistage processing, and required engineers to write their own data chunking and error-handling logic as inference needs scaled.

Results

Time saved1.5 billion per week

Volumefewer than 50

Cost replacedsignificantly reduced

Running sincelate 2021

Source

https://corp.roblox.com/newsroom/2024/09/running-ai-inference-at-scale-in-the-hybrid-cloud

How we source this →

Grounding & classification

Source type: technical build writeup

49 fields verified against source quotes, 1 dropped as unverifiable.

computer visioncontent generationrecommendation systemtranslationvoice aiknowledge basefailure mode describedhuman review describedmetric backednamed customerproduction runtime claimedtools describedworkflow describedmediasoftwarecost reductionemployee productivityresponse time reductionthroughput increasetechnical build writeupback office opsquality assurancehuman review queuemonitor detect alert