feat(ai-cache): add semantic (L2) cache layer#13632
Conversation
…ivers, guard encode+write)
Add TEST 16/17/18 to t/plugin/ai-cache-semantic.t: - TEST 16: embedding-provider returns 500 → semantic skips fail-open, L1 exact cache unaffected; MISS then HIT on identical request proves the broken embeddings endpoint never breaks L1 or upstream traffic. - TEST 17: include_vars=["http_x_tenant"] partitions L2; same prompt with same fixed vector under tenant "globex" gets a MISS after tenant "acme" populated L2 — no cross-tenant semantic leakage. Positive control (acme again) confirms the acme entry is intact (L1 HIT). - TEST 18: azure_openai driver unit test mirroring TEST 7/8 style. Mock server at 7745 rejects requests without the api-key header (proving the driver does NOT send Authorization), accepts with it, and returns data[1].embedding; mock at 7746 returns 500 → nil+err.
- Add `layers` attribute row (array, default ["exact"], values exact/semantic) - Add Semantic (L2) Attributes subsection with full schema table for all semantic.* fields matching schema.lua verbatim (thresholds, top_k, distance_metric cosine-only, ttl 86400, match.*, embedding oneOf, vector_search) - Add Redis Stack (RediSearch) caution callout: required when semantic layer is enabled; L1 and L2 share the same connection - Update cache_headers description to include X-AI-Cache-Similarity header (emitted on semantic hit; value = cosine similarity = 1 − distance) - Add exact+semantic worked example (Admin API JSON + ADC YAML forms) - Add security note on multi-tenant deployments (include_consumer/include_vars) - Update description to reflect both layers are now available - Apply identical changes to zh doc with translated prose
…polish
- schema: wrap similarity_threshold line to ≤100 chars (luacheck)
- schema: add contains={const="exact"} to layers — ["semantic"]-only is now invalid
- embeddings/openai + azure_openai: drop unused err assignment from json.decode (luacheck);
add local HTTP_OK; guard not res.body before decode (mirrors ai-rag pattern)
- vector-search/redis: ensure_index gains a prefix param; FT.CREATE PREFIX uses it
instead of the hardcoded "ai-cache:l2:" so the caller controls the key namespace
- semantic: add l2_base(conf) helper; index_name, lookup, and write all read
conf.semantic.vector_search.redis.index (default "ai-cache") — wires the config knob
- key: fix wrong context_fingerprint doc comment; restore PR-1 "why" comments in
build_repr's effective block (model precedence + endpoint significance)
- ai-cache: format X-AI-Cache-Similarity to 4 d.p. via string.format("%.4f", ...)
- docs (en + zh): note that "exact" is always active and must be in layers
- tests: update TEST 9+10 for new ensure_index signature; add TEST 19 (schema rejects
["semantic"]-only); add TEST 20 (custom index="myidx" MISS→HIT + L2 key prefix check)
Cache hits now report which layer served them via X-AI-Cache-Status: HIT-L1 (exact) or HIT-L2 (semantic). The en/zh docs and the exact-cache suite (t/plugin/ai-cache.t) are updated to match. Rework t/plugin/ai-cache-semantic.t into the codebase's declarative test-nginx idiom: granular schema/unit blocks, and end-to-end flows split into route-setup + request blocks (--- request / --- response_headers / --- wait) over the shared :1980 X-AI-Fixture upstream. Drive the embedding mock from real text-embedding-3-small responses (captured at dimensions=64) stored as t/fixtures/openai/embeddings-*.json; the mock logic lives in t/lib/ai_cache_mock.lua. Schema-rejection tests assert the exact validation error via --- response_body eval.
…ding drivers, add ssl_verify and timeout to schema
…re embedding configurations
…sage handling and protocol checks
There was a problem hiding this comment.
Pull request overview
Adds an opt-in semantic (L2) caching layer to the existing ai-cache plugin. On an L1 miss, the plugin can embed the prompt and query a Redis Stack (RediSearch) vector index for a sufficiently similar prior response, serving it as an L2 hit and backfilling L1; defaults remain exact-only.
Changes:
- Implement semantic (L2) cache lookup/write path backed by Redis Stack RediSearch and embedding providers (OpenAI, Azure OpenAI).
- Update
ai-cacheruntime behavior and headers to reportHIT-L1/HIT-L2(plus similarity header on L2 hits). - Add comprehensive unit/e2e tests, fixtures, and docs for semantic caching configuration and behavior.
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| t/plugin/ai-cache.t | Updates existing tests to expect HIT-L1 status on exact cache hits. |
| t/plugin/ai-cache-semantic.t | Adds new test suite covering schema validation, embedding drivers, vector search, isolation behavior, and fail-open semantics. |
| t/lib/ai_cache_mock.lua | Adds a mock embeddings upstream that replays captured embedding fixtures and validates auth behavior. |
| t/fixtures/openai/embeddings-capital.json | Adds captured embedding fixture used by semantic cache tests. |
| t/fixtures/openai/embeddings-capital-city.json | Adds captured embedding fixture used by semantic cache tests. |
| t/fixtures/openai/embeddings-largest-city.json | Adds captured embedding fixture used by semantic cache tests. |
| t/fixtures/openai/embeddings-tire.json | Adds captured embedding fixture used by semantic cache tests. |
| Makefile | Installs new ai-cache submodules for embeddings and vector-search at build/install time. |
| docs/en/latest/plugins/ai-cache.md | Documents semantic caching (Redis Stack requirement, config fields, headers, and examples). |
| docs/zh/latest/plugins/ai-cache.md | Chinese documentation updates for semantic caching configuration and examples. |
| apisix/plugins/ai-cache/vector-search/redis.lua | Adds Redis Stack (RediSearch) helpers: index creation, upsert, and KNN search with float32 packing. |
| apisix/plugins/ai-cache/semantic.lua | Implements L2 semantic cache: prompt extraction, embedding, partitioning, lookup, L1 backfill, and L2 write. |
| apisix/plugins/ai-cache/schema.lua | Extends schema with layers and semantic configuration; adds encryption paths for embedding API keys. |
| apisix/plugins/ai-cache/key.lua | Extends key logic to support semantic partitioning and context fingerprinting utilities. |
| apisix/plugins/ai-cache/embeddings/base.lua | Shared HTTP fetch + response parsing for embeddings providers. |
| apisix/plugins/ai-cache/embeddings/openai.lua | OpenAI embeddings driver (Bearer auth, optional endpoint/model/dimensions). |
| apisix/plugins/ai-cache/embeddings/azure_openai.lua | Azure OpenAI embeddings driver (api-key header, endpoint required). |
| apisix/plugins/ai-cache.lua | Integrates L2 lookup on L1 miss, adds new headers (HIT-L1/HIT-L2, similarity), and schedules L2 writes. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…ntent and update error handling in Redis upsert
…ng and Redis index management
|
One compatibility concern: this PR changes the existing exact-cache hit header from The |
@membphis The initial plan was to keep using |
|
Thanks, that makes sense if One extra question: what is the expected user-facing distinction between L1 and L2 here? My understanding is that the two-layer design is mainly an internal optimization/correctness model: L1 is deterministic and cheap, while L2 is approximate and more expensive because it requires embedding plus vector search. Since an L2 hit already has |
I believe it is a matter of design, both of your points are legit. I simply reverted back to design choice here in the pic (ref #13290): |
membphis
left a comment
There was a problem hiding this comment.
Reviewed, no blocking issues found.

Description
Adds the optional semantic (L2) cache layer to
ai-cache. On an exact (L1) miss, the request prompt is embedded and matched against a RediSearch cosine vector index; a hit abovesimilarity_thresholdis served and backfilled into L1. The layer is strictly opt-in vialayers: ["exact", "semantic"], so the default exact-only path is unchanged.openaiandazure_openai.cache_key) partition isolation of the vector index.HIT-L1/HIT-L2(withX-AI-Cache-Similarity).Which issue(s) this PR fixes:
Part of #13290
Checklist