it_support · saas · workflow

Moveworks Copilot achieves 2.3x+ throughput and 2.35x latency improvement with NVIDIA TensorRT-LLM

LLM processing delays created frustrating lags in the Moveworks Copilot's conversational flow, disrupting employee productivity and limiting the system's ability to scale efficiently on existing infrastructure.

How it works

Common implementation structure

How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.

Stage 1 · Employee submits question

An employee submits a question to the Moveworks Copilot, initiating the conversational AI interaction.

Tools used

Flash-DecodingSmoothQuant

Outcome

With NVIDIA TensorRT-LLM, the Moveworks Copilot achieved 44 tokens per second (up from 19), average request latency of 1.5 seconds (down from 3.4 seconds), and first token latency of 0.3 seconds (down from 0.8 seconds), enabling smoother conversational flow and more efficient infrastructure utilization.

Results

Time savedfrom 3.4 seconds to 1.5 seconds

Volume44 tokens per second

Source

https://www.moveworks.com/us/en/resources/blog/moveworks-achieves-low-latency-with-nvidia-tensorrt-llm

How we source this →

Grounding & classification

Source type: technical build writeup

20 fields verified against source quotes, 2 dropped as unverifiable.

conversational aimetric backedsource backedtools describedvendor confirmedworkflow describedsoftwarecost reductionresponse time reductionthroughput increasetechnical build writeupit supportautonomous resolution