back_office_ops · saas · workflow

HubX achieves 2.5x faster inference and 40% cost reduction with Google Kubernetes Engine and Trillium TPUs

HubX required AI-powered mobile apps to deliver responses within 10 seconds to prevent user churn, but its previous infrastructure produced slow processing times and latency issues, making rapid iteration and deployment difficult.

How it works

Common implementation structure

How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.

Stage 1 · Serverless app deployment

HubX uses Cloud Run and Cloud Run functions for rapid, serverless deployment of AI/ML applications.

Tools used

Google Kubernetes EngineAI HypercomputerTrillium TPUsA100 GPUsL4 GPUsCloud RunCloud Run functionsHyperdisk MLGoogle Cloud

Outcome

After adopting GKE, HubX achieved 2.5x faster inference speeds via Trillium TPUs, reduced operating costs by 40%, delivered user query responses in under 10 seconds, and unlocked 20-30x faster model boot-up times with cold start times reduced by one full minute.

What failed first

HubX's previous infrastructure caused slow processing times and latency issues that drove high user churn and blocked rapid iteration.

Results

Time savedless than 10 seconds

Volume2.5x faster

Cost replaced40%

Source

https://cloud.google.com/customers/hubx-ai

How we source this →

Grounding & classification

Source type: vendor customer story

31 fields verified against source quotes.

computer visioncontent generationfailure mode describedmetric backednamed customerproduction runtime claimedtools describedworkflow describedsoftwarecost reductioncycle time reductionemployee productivityresponse time reductionthroughput increasevendor customer storyback office ops