incident_management · saas · workflow

Behind the scenes of Elastic Security's generative AI features: a quantitative approach to prompt tuning and LLM evaluation

Elastic's Security GenAI team needed a robust, reproducible way to evaluate LLM prompt quality and compare model providers at scale; the initial approach relied on manual spreadsheet-based testing that was effective but time-intensive and did not scale as more features were added.

How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · User submits request
The workflow begins when the system receives a user request.
Tools used
Elastic AI AssistantAttack DiscoveryAutomatic ImportLangSmithLangGraphElasticsearchES|QL
Outcome

Elastic transitioned from manual to automated LLM evaluations using LangSmith and LangGraph, building a framework that enables quantitative comparison of prompts and models, with a real-time rubric check in production that regenerates responses if they fall below quality standards.

Results
Volume85%
Running sinceJune 2023
Source

https://www.elastic.co/blog/elastic-security-generative-ai-features

How we source this →

Grounding & classification
Source type: technical build writeup
27 fields verified against source quotes.
agentic workflowanomaly detectionconversational airagsummarizationknowledge basebuilder submittedhuman review describedproduction runtime claimedtools describedworkflow describedsoftwareaccuracy improvementemployee productivitytechnical build writeupincident managementmonitor detect alertrag answering