compliance_monitoring · saas · workflow

Metasense V2: Grab enhances LLM-powered data governance to classify an entire data lake

Grab's internal metadata generation service relied on a third-party data classification tool with limited ML customisation options, leaving the growing data lake dependent on manual tagging by Grabbers and making it impossible to scale data governance efficiently.

How it works

Common implementation structure

How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.

Stage 1 · Data table submitted for classification

New or existing data lake tables are submitted to the model for automated tag generation instead of manual classification by Grabbers.

Tools used

LangChainLangSmithLarge Language Models

Outcome

Metasense V2 now covers the vast majority of Grab's data lake tables, has significantly reduced the manual classification workload for Grabbers, and achieves exceptionally low misclassification rates supported by automated threshold alerts.

What failed first

The first LLM model was overwhelmed by mixed PII/non-PII data: 13 of 21 tags targeted non-PII distinctions, consuming capacity needed for PII detection, while overly long prompts and large table inputs further degraded accuracy on edge cases such as business emails containing personal names and nested JSON with hidden PII.

Results

Time saved300-400

Volumemore than 20,000

Running sinceearly 2024

Source

https://engineering.grab.com/metasense-v2

How we source this →

Grounding & classification

Source type: technical build writeup

30 fields verified against source quotes, 1 dropped as unverifiable.

data extractiondocument classificationknowledge basefailure mode describedhuman review describedmetric backednamed customerproduction runtime claimedtools describedlogisticssoftwareaccuracy improvementautomation rateemployee productivitytechnical build writeupcompliance monitoringdata entry opsextract classify routehuman review queue