How Cisco Built End-to-End LLM Observability for Its Splunk AI Assistant Using RAG
Running LLM-powered applications at scale brings unique challenges around accuracy, reliability, cost control, and user trust, with no unified visibility into the full lifecycle of a RAG system's answers across retrieval, generation, and output quality.
How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · User query submitted
A user question initiates the RAG pipeline, starting the full LLM observability lifecycle.
Cisco deployed a Splunk-based observability system for its RAG pipeline that achieves a 99.982% success rate and provides end-to-end traceability from user query through retrieval, generation, and output quality, enabling rapid root-cause analysis of AI failures.
What failed first
Without explicit prompt guidance, the RAG system failed to prioritize the most relevant document for a user query, producing an incomplete or potentially misleading answer—a mild hallucination—that required observability tooling to detect and diagnose.