Workflow · saas · workflow

All the Hard Stuff Nobody Talks About when Building Products with LLMs

Building a production AI feature backed by LLMs is far harder than demos suggest: context windows may be too small for large customer schemas, LLM latency is high, prompt engineering has few established best practices, correctness and broad input acceptance are in tension, prompt injection has no complete solution, and compliance obligations add significant overhead.

How it works

Common implementation structure

How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.

Stage 1 · User submits NL query

A user expresses a desired Honeycomb query in natural language.

Tools used

gpt-3.5-turboClaudegpt-4LangChain

Outcome

Honeycomb shipped Query Assistant to all users within a one-month timeline; the feature is live and actively used, a full security and compliance audit of LLM providers was completed, new legal terms were drafted, and prompt injection mitigation measures were put in place.

What failed first

Zero-shot prompting produced no usable output; single-shot worked but poorly; chaining LLM calls compounded latency and accuracy problems; Claude 100k with a full schema dump was slower and hallucinated more than a targeted embedding approach; LangChain provided no tangible improvement in query generation.

Results

Time savedtwo to 15+ seconds

Running sinceearlier this month

Source

https://www.honeycomb.io/blog/hard-stuff-nobody-talks-about-llm

How we source this →

Grounding & classification

Source type: technical build writeup

17 fields verified against source quotes, 1 dropped as unverifiable.

code generationknowledge basebuilder submittedfailure mode describednamed customerproduction runtime claimedtools describedworkflow describedsoftwaretechnical build writeup