Workflow · saas · workflow

How GitHub built Copilot: a globally-distributed LLM code completion service serving 400M+ requests at under 200ms

GitHub needed to serve LLM-based code completions with latency competitive against locally-run IDE autocomplete, despite the overhead of network latency, shared server resources, and cloud outages. Authentication at scale and efficient request cancellation were also unsolved challenges.

How it works

Common implementation structure

How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.

Stage 1 · User pauses typing in IDE

Whenever the user stops typing, Copilot initiates a completion request.

Tools used

GitHub Copilotcopilot-proxyGoHTTP/2GLBHAProxyoctoDNSAzure · partnerOpenAI · partnerVS Code

Outcome

GitHub Copilot serves more than 400 million completion requests with a mean response time under 200 milliseconds and peaks at 8,000 requests per second, achieving global resilience through regional proxy colocation and self-healing DNS routing.

What failed first

The alpha required users to supply their own OpenAI API keys and scaled to only dozens of users. Standard HTTP/1-based cancellation forced costly TCP reconnections after every cancelled request. A point-of-presence model caused traffic tromboning and high operational burden. Most cloud load balancers downgraded HTTP/2 to HTTP/1 on the backend, undermining stream-level cancellation.

Results

Time savedless than 200 milliseconds

Volumemore than 400 million

Source

https://www.infoq.com/presentations/github-copilot/

How we source this →

Grounding & classification

Source type: technical build writeup

28 fields verified against source quotes.

code generationfailure mode describedmetric backednamed customerproduction runtime claimedtools describedworkflow describedsoftwareresponse time reductionthroughput increasetechnical build writeupai draft human approval