back_office_ops · workflow

Semantic multi-component embeddings with Redis reduce LLM token consumption by 91% in enterprise tool selection

An enterprise AI platform with 70+ automated tools was sending all tool definitions to the LLM on every query, consuming over 8,000 tokens per request, causing spiralling costs, slower responses, and irrelevant tool suggestions.

How it works

Common implementation structure

How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.

Stage 1 · User query arrives

A user submits a natural language query to the enterprise AI platform.

Tools used

Redis Stacktext-embedding-3-small

Outcome

Semantic tool selection with multi-component embeddings and Redis vector search reduced token consumption by 91.5% and cost per query by 49%, while improving Precision@3 to 95% and delivering 31% faster response times.

What failed first

Keyword matching and category filtering failed to bridge the vocabulary gap between natural language queries and technical tool names, and could not disambiguate context-dependent queries.

Results

Time saved31% faster

Volume91.5%

Cost replaced49%

Source

https://mlops.community/blog/how-i-reduced-ai-token-costs-by-91percent-with-semantic-tool-selection-and-redis

How we source this →

Grounding & classification

Source type: technical build writeup

33 fields verified against source quotes, 1 dropped as unverifiable.

enterprise searchrecommendation systemknowledge basebuilder submittedfailure mode describedmetric backedproduction runtime claimedworkflow describedsoftwareaccuracy improvementcost reductionthroughput increasetime savedtechnical build writeupback office opsextract classify route