incident_management · saas · workflow

Behind the scenes of Elastic Security's generative AI features: a quantitative approach to prompt tuning and LLM evaluation

Elastic's Security GenAI team needed a robust, reproducible way to evaluate LLM prompt quality and compare model providers at scale; the initial approach relied on manual spreadsheet-based testing that was effective but time-intensive and did not scale as more features were added.

How it works

Common implementation structure

How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.

Stage 1 · User submits request

The workflow begins when the system receives a user request.

Tools used

Elastic AI AssistantAttack DiscoveryAutomatic ImportLangSmithLangGraphElasticsearchES|QL

Outcome

Elastic transitioned from manual to automated LLM evaluations using LangSmith and LangGraph, building a framework that enables quantitative comparison of prompts and models, with a real-time rubric check in production that regenerates responses if they fall below quality standards.

Results

Volume85%

Running sinceJune 2023

Source

https://www.elastic.co/blog/elastic-security-generative-ai-features

How we source this →

Grounding & classification

Source type: technical build writeup

27 fields verified against source quotes.

agentic workflowanomaly detectionconversational airagsummarizationknowledge basebuilder submittedhuman review describedproduction runtime claimedtools describedworkflow describedsoftwareaccuracy improvementemployee productivitytechnical build writeupincident managementmonitor detect alertrag answering