it_support · saas · workflow

Moveworks Copilot achieves 2.3x+ throughput and 2.35x latency improvement with NVIDIA TensorRT-LLM

LLM processing delays created frustrating lags in the Moveworks Copilot's conversational flow, disrupting employee productivity and limiting the system's ability to scale efficiently on existing infrastructure.

How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Employee submits question
An employee submits a question to the Moveworks Copilot, initiating the conversational AI interaction.
Tools used
Flash-DecodingSmoothQuant
Outcome

With NVIDIA TensorRT-LLM, the Moveworks Copilot achieved 44 tokens per second (up from 19), average request latency of 1.5 seconds (down from 3.4 seconds), and first token latency of 0.3 seconds (down from 0.8 seconds), enabling smoother conversational flow and more efficient infrastructure utilization.

Results
Time savedfrom 3.4 seconds to 1.5 seconds
Volume44 tokens per second
Source

https://www.moveworks.com/us/en/resources/blog/moveworks-achieves-low-latency-with-nvidia-tensorrt-llm

How we source this →

Grounding & classification
Source type: technical build writeup
20 fields verified against source quotes, 2 dropped as unverifiable.
conversational aimetric backedsource backedtools describedvendor confirmedworkflow describedsoftwarecost reductionresponse time reductionthroughput increasetechnical build writeupit supportautonomous resolution