Dataherald cuts LLM costs by 83% by detecting uncapped token growth with LangSmith
Dataherald's NL-to-SQL engine required GPT-4–32K for accuracy, making token costs a central concern. Their manual tracking approach via MongoDB could not identify which agent tools were driving costs, and a bug in the few-shot retriever caused token usage to grow practically uncapped over time.
How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Plain English query via API
Plain English questions about a relational database are submitted to the NL-to-SQL engine via API.
Within hours of setting up LangSmith, Dataherald identified a bug causing the few-shot retriever to consume about 150,000 tokens per query. After a hotfix and additional fixes, token usage dropped to approximately 25,500 tokens per query, slashing costs by 83%. The team now shares LangSmith execution run links via Slack and Jira for rapid production debugging.
What failed first
Their prior monitoring stack — TikToken library and LangChain callback handlers written to MongoDB — required manual queries in MongoDB Compass to aggregate costs, gave no per-tool visibility, and caused them to miss a growing bug in the few-shot retriever.