Metasense V2: Grab enhances LLM-powered data governance to classify an entire data lake
Grab's internal metadata generation service relied on a third-party data classification tool with limited ML customisation options, leaving the growing data lake dependent on manual tagging by Grabbers and making it impossible to scale data governance efficiently.
How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Data table submitted for classification
New or existing data lake tables are submitted to the model for automated tag generation instead of manual classification by Grabbers.
Tools used
LangChainLangSmithLarge Language Models
Outcome
Metasense V2 now covers the vast majority of Grab's data lake tables, has significantly reduced the manual classification workload for Grabbers, and achieves exceptionally low misclassification rates supported by automated threshold alerts.
What failed first
The first LLM model was overwhelmed by mixed PII/non-PII data: 13 of 21 tags targeted non-PII distinctions, consuming capacity needed for PII detection, while overly long prompts and large table inputs further degraded accuracy on edge cases such as business emails containing personal names and nested JSON with hidden PII.