back_office_ops · saas · workflow
HubX achieves 2.5x faster inference and 40% cost reduction with Google Kubernetes Engine and Trillium TPUs
HubX required AI-powered mobile apps to deliver responses within 10 seconds to prevent user churn, but its previous infrastructure produced slow processing times and latency issues, making rapid iteration and deployment difficult.
How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Serverless app deployment
HubX uses Cloud Run and Cloud Run functions for rapid, serverless deployment of AI/ML applications.
Tools used
Google Kubernetes EngineAI HypercomputerTrillium TPUsA100 GPUsL4 GPUsCloud RunCloud Run functionsHyperdisk MLGoogle Cloud
Outcome
After adopting GKE, HubX achieved 2.5x faster inference speeds via Trillium TPUs, reduced operating costs by 40%, delivered user query responses in under 10 seconds, and unlocked 20-30x faster model boot-up times with cold start times reduced by one full minute.
What failed first
HubX's previous infrastructure caused slow processing times and latency issues that drove high user churn and blocked rapid iteration.
Results
Time savedless than 10 seconds
Volume2.5x faster
Cost replaced40%
Grounding & classification
Source type: vendor customer story
31 fields verified against source quotes.
computer visioncontent generationfailure mode describedmetric backednamed customerproduction runtime claimedtools describedworkflow describedsoftwarecost reductioncycle time reductionemployee productivityresponse time reductionthroughput increasevendor customer storyback office ops