recruiting · saas · workflow

Accelerating LLM inference with speculative decoding: Lessons from LinkedIn's Hiring Assistant

LinkedIn's Hiring Assistant required low-latency conversational responses for recruiters, but LLM generation was slow when processing thousands of tokens from long job descriptions and candidate profiles.

How it works

Common implementation structure

How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.

Stage 1 · Recruiter submits request

Recruiters expect conversational responses in seconds, initiating requests to Hiring Assistant.

Tools used

vLLM

Outcome

Applying n-gram speculative decoding achieved nearly 4× higher throughput and an average 66% reduction in P90 end-to-end latency without any quality degradation.

Results

Volumenearly 4×

Source

https://www.linkedin.com/blog/engineering/ai/accelerating-llm-inference-with-speculative-decoding-lessons-from-linkedins-hiring-assistant

How we source this →

Grounding & classification

Source type: technical build writeup

18 fields verified against source quotes.

agentic workflowcontent generationresumebuilder submittedmetric backednamed customerproduction runtime claimedtools describedworkflow describedsoftwarecycle time reductionthroughput increasetechnical build writeuprecruitingagentic task execution