quality_assurance · saas · workflow

How GitHub Next is experimenting with LLMs to evolve GitHub Copilot

Developers spent enormous time searching through documentation and discovering CLI commands, with no intelligent tooling to help them find answers or understand pull request context quickly.

How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Developer submits pull request
A developer submits a pull request, triggering AI generation of a description and walkthrough.
Tools used
GPT-4GitHub CopilotCopilot for Pull RequestsCopilot for DocsCopilot for CLIvector database
Outcome

Pivoting to a suggestion-based UX for pull request descriptions transformed negative internal feedback into positive reception; adding reference links in Copilot for Docs made developers tolerant of imperfect AI outputs; and shipping early for real human feedback was established as a core development principle.

What failed first

An initial internal study of the pull request description feature did not go well because developers were concerned the AI would be wrong; serving AI content as a comment rather than a suggestion created a poor UX that undermined adoption even when the content itself was correct.

Results
Volumefeedback changed to 'wow, these are helpful suggestions'
Running sinceMarch 22, 2023
Source

https://github.blog/2023-12-06-how-were-experimenting-with-llms-to-evolve-github-copilot/

How we source this →

Grounding & classification
Source type: technical build writeup
28 fields verified against source quotes.
code generationconversational aiknowledge searchragsummarizationcode diff prknowledge basebuilder submittedfailure mode describedhuman review describednamed customerproduction runtime claimedtools describedworkflow describedsoftwareemployee productivitytechnical build writeupquality assuranceai draft human approvalrag answering