quality_assurance · saas · workflow

NVIDIA automates GPU attention kernel generation with DeepSeek-R1 and inference-time scaling

Creating optimized GPU attention kernels requires significant skill and time even for experienced engineers, and LLMs face challenges generating correct optimized kernel code on the first try due to hallucinations, syntax errors, and non-trivial GPU thread mapping.

How it works

Common implementation structure

How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.

Stage 1 · Manual prompt initializes workflow

The workflow is first initialized by a manual prompt.

Tools used

DeepSeek-R1H100KernelBench

Outcome

The closed-loop workflow produced numerically correct kernels for 100% of Level-1 problems and 96% of Level-2 problems, with results in some cases better than kernels developed by skilled engineers.

What failed first

LLMs used directly for kernel generation produce hallucinated code, mix syntax from different languages or frameworks, and require iterative refinement to achieve correct and efficient thread mapping.

Results

Time saved15 minutes

Volume100%

Source

https://developer.nvidia.com/blog/automating-gpu-kernel-generation-with-deepseek-r1-and-inference-time-scaling/

How we source this →

Grounding & classification

Source type: technical build writeup

20 fields verified against source quotes.

agentic workflowcode generationcode diff prmetric backednamed customersource backedtools describedworkflow describedsoftwareaccuracy improvementautomation ratetechnical build writeupquality assuranceagentic task execution