quality_assurance · saas · workflow

Grammarly's mEdIT: fine-tuning multilingual LLMs to support cross-lingual text editing across seven languages

Popular foundational LLMs produce low-quality outputs for text editing tasks, and prior fine-tuning efforts addressed either multiple editing tasks for a single language or a single task across multiple languages—never both simultaneously.

How it works

Common implementation structure

How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.

Stage 1 · Curate multilingual training data

More than two hundred thousand instruction-rewrite pairs were curated from publicly available datasets across seven languages and three editing tasks.

Tools used

mT5mT0BLOOMZPolyLMBactrian-XGPT3.5GPT4GitHubHugging Face

Outcome

mEdIT fine-tuned models show a substantial improvement over untrained counterparts across multiple languages and editing tasks, generalize to unseen languages, and received high ratings from human evaluators across all languages.

What failed first

GPT3.5 and GPT4 used as zero-shot baselines performed poorly relative to fine-tuned models on multilingual text editing tasks, with GPT3.5 performing the least well of all models considered.

Source

https://www.grammarly.com/blog/engineering/advancing-intelligent-writing/

How we source this →

Grounding & classification

Source type: technical build writeup

26 fields verified against source quotes.

content generationtranslationknowledge basehuman review describedmetric backedpeer confirmedtools describedworkflow describedsoftwareaccuracy improvementtechnical build writeupquality assurance