back_office_ops · workflow
Semantic multi-component embeddings with Redis reduce LLM token consumption by 91% in enterprise tool selection
An enterprise AI platform with 70+ automated tools was sending all tool definitions to the LLM on every query, consuming over 8,000 tokens per request, causing spiralling costs, slower responses, and irrelevant tool suggestions.
How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · User query arrives
A user submits a natural language query to the enterprise AI platform.
Tools used
Redis Stacktext-embedding-3-small
Outcome
Semantic tool selection with multi-component embeddings and Redis vector search reduced token consumption by 91.5% and cost per query by 49%, while improving Precision@3 to 95% and delivering 31% faster response times.
What failed first
Keyword matching and category filtering failed to bridge the vocabulary gap between natural language queries and technical tool names, and could not disambiguate context-dependent queries.
Results
Time saved31% faster
Volume91.5%
Cost replaced49%
Grounding & classification
Source type: technical build writeup
33 fields verified against source quotes, 1 dropped as unverifiable.
enterprise searchrecommendation systemknowledge basebuilder submittedfailure mode describedmetric backedproduction runtime claimedworkflow describedsoftwareaccuracy improvementcost reductionthroughput increasetime savedtechnical build writeupback office opsextract classify route