recruiting · saas · workflow

How LinkedIn leveraged vLLM to power GenAI applications at scale

LinkedIn needed to deploy LLMs at scale across diverse real-time and batch GenAI use cases while meeting strict latency requirements, managing GPU efficiency, and giving internal engineering teams control over performance tuning without modifying engine code.

How it works

Common implementation structure

How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.

Stage 1 · Hirer specifies qualifications

A hirer specifies a set of qualifications that the system uses to identify and evaluate potential candidates.

Tools used

vLLMAsyncLLMEnginePagedAttentionCUDA graphsgRPCOpenAI-compatible API

Outcome

vLLM now supports more than 50 GenAI use cases at LinkedIn and runs on thousands of hosts. The v1 engine upgrade saved over 60 GPUs for one workload. Open-source contributions yielded a 7% improvement in Time Per Output Token and an 8% improvement in decoding speed for smaller models.

What failed first

LinkedIn's initial serving stack tightly coupled the server and engine, limiting flexibility. The first offline deployment had limited concurrency suitable only for low-QPS workloads. Traditional NER models for job search were brittle, costly to maintain, and slow to adapt to evolving language and job taxonomies.

Results

Time saved7%

Volumemore than 50

Running sincelate 2023

Source

https://www.linkedin.com/blog/engineering/ai/how-we-leveraged-vllm-to-power-our-genai-applications?utm_source=substack&utm_medium=email

How we source this →

Grounding & classification

Source type: technical build writeup

37 fields verified against source quotes.

agentic workflowai agentcontent generationenterprise searchpersonalizationrecommendation systemmetric backednamed customerproduction runtime claimedtools describedworkflow describedsoftwarecost reductioncycle time reductionemployee productivitythroughput increasetechnical build writeuprecruitingagentic task executionextract classify route