quality_assurance · saas · workflow

How GitHub is experimenting with LLMs to evolve GitHub Copilot

Developers spend significant time searching documentation for answers and struggle with pull request descriptions and CLI commands—areas GitHub identified as underexplored opportunities for AI assistance.

How it works

Common implementation structure

How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.

Stage 1 · Developer submits pull request

A developer submits a pull request, kicking off the Copilot for Pull Requests workflow.

Tools used

GPT-4GitHub CopilotGitHub Copilot ChatCopilot for Pull RequestsCopilot for DocsCopilot for CLILLMsvector database

Outcome

After pivoting to a suggestion-based UX, developer feedback shifted positively. For Copilot for Docs, developers tolerated imperfect answers when references allowed evaluation. Copilot for CLI brought AI-powered command generation with structured explanations to the terminal.

What failed first

An internal study of the early Copilot for Pull Requests feature did not go well because presenting AI output as a comment rather than an editable suggestion caused developers to distrust and reject the outputs.

Results

Volumedevelopers didn't mind if the output wasn't always perfectly correct

Source

https://github.blog/ai-and-ml/llms/how-were-experimenting-with-llms-to-evolve-github-copilot/

How we source this →

Grounding & classification

Source type: technical build writeup

26 fields verified against source quotes, 1 dropped as unverifiable.

code generationconversational aiknowledge searchragsummarizationcode diff prknowledge basefailure mode describedhuman review describedtools describedvendor confirmedworkflow describedsoftwareemployee productivitytechnical build writeupquality assuranceai draft human approvalrag answering