How Ellipsis built a multi-agent AI code review and code search system
Building a reliable AI code review agent is hard: a single mega-agent with a large prompt produces high false positive rates — developers' most common complaint — and LLM performance degrades under large context. Traditional RAG cosine-similarity thresholds work poorly for code search because they fail to capture whether retrieved code is actually useful.
How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · PR opened or tagged
When a user opens a PR, marks a PR as ready to review, or tags @ellipsis-dev, the workflow is triggered.
The multistage filtering pipeline significantly reduces the false positive rate, incremental vector indexing syncs code changes in a couple of seconds, and user feedback is reflected almost immediately in agent behavior without requiring per-customer fine-tuning.
What failed first
The conventional RAG approach — search, rerank by cosine similarity, drop below a threshold — does not work well for code search because relative ranking matters less than whether the retrieved code is actually useful, and cosine similarity cannot reliably capture that distinction.
Results
Time saveda couple seconds
Volumesignificantly reduces the false positive rate