quality_assurance · saas · workflow

How Ellipsis built a multi-agent AI code review and code search system

Building a reliable AI code review agent is hard: a single mega-agent with a large prompt produces high false positive rates — developers' most common complaint — and LLM performance degrades under large context. Traditional RAG cosine-similarity thresholds work poorly for code search because they fail to capture whether retrieved code is actually useful.

How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · PR opened or tagged
When a user opens a PR, marks a PR as ready to review, or tags @ellipsis-dev, the workflow is triggered.
Tools used
GitHub AppHookdeckFastAPIHatchettree-sitterTurbopufferLsproxyModalDynamoDBGPT-4oSonnet-3.6
Outcome

The multistage filtering pipeline significantly reduces the false positive rate, incremental vector indexing syncs code changes in a couple of seconds, and user feedback is reflected almost immediately in agent behavior without requiring per-customer fine-tuning.

What failed first

The conventional RAG approach — search, rerank by cosine similarity, drop below a threshold — does not work well for code search because relative ranking matters less than whether the retrieved code is actually useful, and cosine similarity cannot reliably capture that distinction.

Results
Time saveda couple seconds
Volumesignificantly reduces the false positive rate
Source

https://www.ellipsis.dev/blog/how-we-built-ellipsis

How we source this →

Grounding & classification
Source type: technical build writeup
33 fields verified against source quotes, 1 dropped as unverifiable.
code generationknowledge searchmulti agent workflowragsummarizationcode diff prknowledge basefailure mode describedmetric backedproduction runtime claimedtools describedworkflow describedsoftwareerror reductiontechnical build writeupquality assuranceextract classify routerag answering