compliance_monitoring · logistics · workflow

Grab automates PetaByte-scale data classification with LLM, scanning 20,000+ entities per month

Grab managed PetaByte-level data across countless schemas, but manual PII campaigns were inconsistent — developers interpreted classification policies differently — and the data volume made table-level manual tagging infeasible. The existing third-party classification tool could not be customised and its regex classifiers generated too many false positives.

How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Data platform scan request
Data platforms trigger scan requests to the Gemini service to initiate the tag classification process.
Tools used
GeminiGPT3.5Azure OpenAIKafka
Outcome

Within a month of rollout the LLM-powered system scanned over 20,000 data entities at 300–400 per day, saving an estimated 360 man-days per year. Eighty percent of data owners reported the new tagging process helped them, and acknowledged tables required fewer than one tag change on average.

What failed first

The initial Gemini service built around a third-party classification tool had two blocking gaps: its ML classifiers could not be customised to Grab's internal taxonomy, and regex patterns produced excessive false positives. Building a bespoke in-house model was equally impractical due to the labelling and data-science investment required.

Results
Time savedmore than 20,000
Volume300-400 entities per day
Source

https://engineering.grab.com/llm-powered-data-classification

How we source this →

Grounding & classification
Source type: technical build writeup
29 fields verified against source quotes.
data extractiondocument classificationknowledge basehuman review describedmetric backednamed customerproduction runtime claimedtools describedworkflow describedfinancial serviceslogisticssoftwareaccuracy improvementautomation ratethroughput increasetime savedtechnical build writeupcompliance monitoringdata entry opsai draft human approvalextract classify route