it_support · saas · workflow

Dosu uses LangSmith to scale evaluation-driven development for their AI GitHub assistant

As Dosu's installation base grew, their manual approach of reviewing logs with grep and print statements became unscalable, making it nearly impossible to monitor responses and identify failure modes in production—a step critical to their evaluation-driven development workflow. The broader problem Dosu was built to address is that up to 85% of developers' time is spent on non-coding tasks such as answering questions and triaging issues.

How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · User submits GitHub issue
Users submit requests to Dosu via GitHub issues, ranging from simple codebase questions to error traces.
Tools used
LangSmithLangChainGitHub · partnerOpenAI
Outcome

LangSmith gave Dosu out-of-the-box visibility into all their activity, enabling the team to identify unforeseen failure modes at scale and integrate production monitoring into their EDD workflow. The team is now building automated evaluation dataset collection from production traffic.

What failed first

Manual log review with grep and print statements could not scale with Dosu's growth, and changing LLM prompts frequently caused regressions in areas that had previously been working well.

Results
Time saved85%
Running sincelate June 2023
Source

https://blog.dosu.dev/iterating-towards-llm-reliability-with-evaluation-driven-development/

How we source this →

Grounding & classification
Source type: technical build writeup
25 fields verified against source quotes.
agentic workflowai agentcode diff prsupport ticketbuilder submittedfailure mode describednamed customerproduction runtime claimedtools describedworkflow describedsoftwareaccuracy improvementemployee productivitytechnical build writeupit supportquality assuranceticket triagemonitor detect alert