Grab builds a user foundation model generating embeddings for personalisation across its superapp ecosystem
Grab's recommender systems relied on hundreds to thousands of manually engineered, task-specific, siloed features that required substantial effort and could not effectively capture sequential interaction data. General-purpose LLMs lacked the contextual understanding for Grab's domain-specific data, and off-the-shelf models could not jointly handle the superapp's mix of tabular, sequential, and multi-modal data.
How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · User interaction data collection
Every interaction on the Grab app — views, clicks, considerations, and transactions — is tracked to produce tabular and clickstream input data.
Tools used
RayRay Data
Outcome
Grab's foundation model now powers ad optimisation, dual app prediction, fraud detection, and churn probability; the distributed Ray infrastructure dramatically reduces costs and accelerates processing times; and teams building on pre-trained embeddings see significantly reduced development time and improved performance.
What failed first
General-purpose LLMs lacked the contextual understanding required for Grab's domain-specific data, and single-task supervised training would produce biased embeddings unsuitable for Grab's diverse verticals.