back_office_ops · saas · workflow

Dataherald cuts LLM costs by 83% by detecting uncapped token growth with LangSmith

Dataherald's NL-to-SQL engine required GPT-4–32K for accuracy, making token costs a central concern. Their manual tracking approach via MongoDB could not identify which agent tools were driving costs, and a bug in the few-shot retriever caused token usage to grow practically uncapped over time.

How it works

Common implementation structure

How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.

Stage 1 · Plain English query via API

Plain English questions about a relational database are submitted to the NL-to-SQL engine via API.

Tools used

LangSmithLangChainRAGGPT-4–32Kvector DBTikTokenMongoDB Compass

Outcome

Within hours of setting up LangSmith, Dataherald identified a bug causing the few-shot retriever to consume about 150,000 tokens per query. After a hotfix and additional fixes, token usage dropped to approximately 25,500 tokens per query, slashing costs by 83%. The team now shares LangSmith execution run links via Slack and Jira for rapid production debugging.

What failed first

Their prior monitoring stack — TikToken library and LangChain callback handlers written to MongoDB — required manual queries in MongoDB Compass to aggregate costs, gave no per-tool visibility, and caused them to miss a growing bug in the few-shot retriever.

Results

Time savedfraction of the time

Volume150,000

Cost replaced83%

Source

https://medium.com/dataherald/cutting-llm-costs-by-83-with-langsmith-e44bb63af2a8

How we source this →

Grounding & classification

Source type: technical build writeup

28 fields verified against source quotes.

data extractionragknowledge basefailure mode describedmetric backednamed customerproduction runtime claimedsource backedtools describedworkflow describedsoftwarecost reductioncycle time reductionemployee productivitytechnical build writeupback office opsmonitor detect alert