kyc_aml · saas · workflow

How Grab built a custom vision LLM to improve document processing for eKYC

Traditional OCR systems struggled with the diversity of Southeast Asian languages and document formats, while proprietary LLMs produced errors, hallucinations, and high latency, and open-source Vision LLMs lacked sufficient accuracy for production use in eKYC workflows.

How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · User document submission
User-submitted documents such as ID cards, driver's licenses, and registration certificates initiate the eKYC process.
Tools used
Qwen2.5 0.5BDocumintCommon Crawl
Outcome

Grab's custom ~1B parameter Vision LLM achieved accuracy within 3pp of the larger 2B model, with Thai document accuracy improving +70pp and Vietnamese +40pp over baseline, while delivering latency that far outperforms traditional OCR models and external APIs.

What failed first

LoRA fine-tuning of Qwen2VL showed promising results for Latin-script documents but still struggled with Thai and Vietnamese documents and unstructured layouts with small, dense text, because open-source Vision LLMs lacked visual text in SEA languages during vision encoder training.

Results
Volume+70pp from baseline
Source

https://engineering.grab.com/custom-vision-llm-at-grab

How we source this →

Grounding & classification
Source type: technical build writeup
27 fields verified against source quotes, 3 dropped as unverifiable.
computer visiondata extractiondocument aiidpocrid documentbuilder submittedmetric backednamed customerproduction runtime claimedworkflow describedfinancial servicessoftwareaccuracy improvementcycle time reductiontechnical build writeupdata entry opskyc amldocument to recordextract classify route