quality_assurance · saas · workflow
Grammarly's mEdIT: fine-tuning multilingual LLMs to support cross-lingual text editing across seven languages
Popular foundational LLMs produce low-quality outputs for text editing tasks, and prior fine-tuning efforts addressed either multiple editing tasks for a single language or a single task across multiple languages—never both simultaneously.
How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Curate multilingual training data
More than two hundred thousand instruction-rewrite pairs were curated from publicly available datasets across seven languages and three editing tasks.
Tools used
mT5mT0BLOOMZPolyLMBactrian-XGPT3.5GPT4GitHubHugging Face
Outcome
mEdIT fine-tuned models show a substantial improvement over untrained counterparts across multiple languages and editing tasks, generalize to unseen languages, and received high ratings from human evaluators across all languages.
What failed first
GPT3.5 and GPT4 used as zero-shot baselines performed poorly relative to fine-tuned models on multilingual text editing tasks, with GPT3.5 performing the least well of all models considered.
Grounding & classification
Source type: technical build writeup
26 fields verified against source quotes.
content generationtranslationknowledge basehuman review describedmetric backedpeer confirmedtools describedworkflow describedsoftwareaccuracy improvementtechnical build writeupquality assurance