recruiting · saas · workflow

Accelerating LLM inference with speculative decoding: Lessons from LinkedIn's Hiring Assistant

LinkedIn's Hiring Assistant required low-latency conversational responses for recruiters, but LLM generation was slow when processing thousands of tokens from long job descriptions and candidate profiles.

How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Recruiter submits request
Recruiters expect conversational responses in seconds, initiating requests to Hiring Assistant.
Tools used
vLLM
Outcome

Applying n-gram speculative decoding achieved nearly 4× higher throughput and an average 66% reduction in P90 end-to-end latency without any quality degradation.

Results
Volumenearly 4×
Source

https://www.linkedin.com/blog/engineering/ai/accelerating-llm-inference-with-speculative-decoding-lessons-from-linkedins-hiring-assistant

How we source this →

Grounding & classification
Source type: technical build writeup
18 fields verified against source quotes.
agentic workflowcontent generationresumebuilder submittedmetric backednamed customerproduction runtime claimedtools describedworkflow describedsoftwarecycle time reductionthroughput increasetechnical build writeuprecruitingagentic task execution