it_support · saas · workflow

Dosu uses LangSmith to scale evaluation-driven development for their AI GitHub assistant

As Dosu's installation base grew, their manual approach of reviewing logs with grep and print statements became unscalable, making it nearly impossible to monitor responses and identify failure modes in production—a step critical to their evaluation-driven development workflow. The broader problem Dosu was built to address is that up to 85% of developers' time is spent on non-coding tasks such as answering questions and triaging issues.

How it works

Common implementation structure

How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.

Stage 1 · User submits GitHub issue

Users submit requests to Dosu via GitHub issues, ranging from simple codebase questions to error traces.

Tools used

LangSmithLangChainGitHub · partnerOpenAI

Outcome

LangSmith gave Dosu out-of-the-box visibility into all their activity, enabling the team to identify unforeseen failure modes at scale and integrate production monitoring into their EDD workflow. The team is now building automated evaluation dataset collection from production traffic.

What failed first

Manual log review with grep and print statements could not scale with Dosu's growth, and changing LLM prompts frequently caused regressions in areas that had previously been working well.

Results

Time saved85%

Running sincelate June 2023

Source

https://blog.dosu.dev/iterating-towards-llm-reliability-with-evaluation-driven-development/

How we source this →

Grounding & classification

Source type: technical build writeup

25 fields verified against source quotes.

agentic workflowai agentcode diff prsupport ticketbuilder submittedfailure mode describednamed customerproduction runtime claimedtools describedworkflow describedsoftwareaccuracy improvementemployee productivitytechnical build writeupit supportquality assuranceticket triagemonitor detect alert