Elastic builds observability for its GenAI Support Assistant chatbot
Elastic's Field Engineering team needed comprehensive observability infrastructure for the newly launched GenAI Support Assistant to detect runtime bugs, monitor latency and usage, and prevent abuse of the LLM service.
How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · User submits support question
A user submits a technical support question to the Support Assistant via the chat interface.
Observability confirmed fixes for runtime bugs, with endpoint throughput dropping to 1 TPM after the data-loading fix, and the Support Assistant served its 100th chat completion 21 hours post-launch with no rate limit violations in the first weeks since launch.
What failed first
The initial first-generation timeout was configured on the client side but the server never became aware when the client aborted the request, requiring a redesign to the server-side API layer. A data-loading bug drove endpoint throughput well over 100 transactions per minute, and HTTP 413 errors appeared when RAG context combined with user input exceeded the server's configured payload size limit.