All the Hard Stuff Nobody Talks About when Building Products with LLMs
Building a production AI feature backed by LLMs is far harder than demos suggest: context windows may be too small for large customer schemas, LLM latency is high, prompt engineering has few established best practices, correctness and broad input acceptance are in tension, prompt injection has no complete solution, and compliance obligations add significant overhead.
Honeycomb shipped Query Assistant to all users within a one-month timeline; the feature is live and actively used, a full security and compliance audit of LLM providers was completed, new legal terms were drafted, and prompt injection mitigation measures were put in place.
Zero-shot prompting produced no usable output; single-shot worked but poorly; chaining LLM calls compounded latency and accuracy problems; Claude 100k with a full schema dump was slower and hallucinated more than a targeted embedding approach; LangChain provided no tangible improvement in query generation.