back_office_ops · saas · workflow

Grammarly builds a compact on-device spelling and grammar correction model (~1B parameters)

Grammarly's writing assistance goes offline when internet is unavailable because its corrections rely on multiple large models that cannot run on a user's device due to limited memory and processing capabilities.

How it works

Common implementation structure

How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.

Stage 1 · User writes in real time

Users expect Grammarly to provide real-time suggestions as they write.

Tools used

LlamaT5MLXGQA

Outcome

A compact ~1B parameter model achieves ~210 tokens/second on an M2 Mac with a 70% reduced memory footprint, enabling real-time on-device spelling and grammar corrections without loss in quality.

What failed first

T5, an encoder-decoder model evaluated as a base model candidate, failed the tokenization requirement by converting nonstandard spaces to regular spaces, making it unsuitable.

Results

Time saved50 tokens per second

Volume~210 tokens/second

Source

https://www.grammarly.com/blog/engineering/efficient-on-device-writing-assistance/

How we source this →

Grounding & classification

Source type: technical build writeup

21 fields verified against source quotes.

quality inspectionemailsocial media postfailure mode describedhuman review describedmetric backedproduction runtime claimedtools describedvendor confirmedworkflow describedsoftwarecost reductionthroughput increasetechnical build writeupback office ops