Skip to content

feat(ai-cache): add semantic (L2) cache layer#13632

Merged
nic-6443 merged 22 commits into
apache:masterfrom
janiussyafiq:feat/ai-cache-semantic
Jul 2, 2026
Merged

feat(ai-cache): add semantic (L2) cache layer#13632
nic-6443 merged 22 commits into
apache:masterfrom
janiussyafiq:feat/ai-cache-semantic

Conversation

@janiussyafiq

@janiussyafiq janiussyafiq commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Description

Adds the optional semantic (L2) cache layer to ai-cache. On an exact (L1) miss, the request prompt is embedded and matched against a RediSearch cosine vector index; a hit above similarity_threshold is served and backfilled into L1. The layer is strictly opt-in via layers: ["exact", "semantic"], so the default exact-only path is unchanged.

  • Embedding providers: openai and azure_openai.
  • Per effective-model and per-tenant (cache_key) partition isolation of the vector index.
  • Fail-open: any embedding or vector-store error degrades to a MISS, never a 5xx.
  • Response status reported as HIT-L1 / HIT-L2 (with X-AI-Cache-Similarity).

Which issue(s) this PR fixes:

Part of #13290

Checklist

  • I have explained the need for this PR and the problem it solves
  • I have explained the changes or the new features added to this PR
  • I have added tests corresponding to this change
  • I have updated the documentation to reflect this change
  • I have verified that this change is backward compatible (If not, please discuss on the APISIX mailing list first)

Add TEST 16/17/18 to t/plugin/ai-cache-semantic.t:

- TEST 16: embedding-provider returns 500 → semantic skips fail-open,
  L1 exact cache unaffected; MISS then HIT on identical request proves
  the broken embeddings endpoint never breaks L1 or upstream traffic.

- TEST 17: include_vars=["http_x_tenant"] partitions L2; same prompt
  with same fixed vector under tenant "globex" gets a MISS after tenant
  "acme" populated L2 — no cross-tenant semantic leakage. Positive
  control (acme again) confirms the acme entry is intact (L1 HIT).

- TEST 18: azure_openai driver unit test mirroring TEST 7/8 style.
  Mock server at 7745 rejects requests without the api-key header
  (proving the driver does NOT send Authorization), accepts with it,
  and returns data[1].embedding; mock at 7746 returns 500 → nil+err.
- Add `layers` attribute row (array, default ["exact"], values exact/semantic)
- Add Semantic (L2) Attributes subsection with full schema table for all
  semantic.* fields matching schema.lua verbatim (thresholds, top_k,
  distance_metric cosine-only, ttl 86400, match.*, embedding oneOf, vector_search)
- Add Redis Stack (RediSearch) caution callout: required when semantic layer
  is enabled; L1 and L2 share the same connection
- Update cache_headers description to include X-AI-Cache-Similarity header
  (emitted on semantic hit; value = cosine similarity = 1 − distance)
- Add exact+semantic worked example (Admin API JSON + ADC YAML forms)
- Add security note on multi-tenant deployments (include_consumer/include_vars)
- Update description to reflect both layers are now available
- Apply identical changes to zh doc with translated prose
…polish

- schema: wrap similarity_threshold line to ≤100 chars (luacheck)
- schema: add contains={const="exact"} to layers — ["semantic"]-only is now invalid
- embeddings/openai + azure_openai: drop unused err assignment from json.decode (luacheck);
  add local HTTP_OK; guard not res.body before decode (mirrors ai-rag pattern)
- vector-search/redis: ensure_index gains a prefix param; FT.CREATE PREFIX uses it
  instead of the hardcoded "ai-cache:l2:" so the caller controls the key namespace
- semantic: add l2_base(conf) helper; index_name, lookup, and write all read
  conf.semantic.vector_search.redis.index (default "ai-cache") — wires the config knob
- key: fix wrong context_fingerprint doc comment; restore PR-1 "why" comments in
  build_repr's effective block (model precedence + endpoint significance)
- ai-cache: format X-AI-Cache-Similarity to 4 d.p. via string.format("%.4f", ...)
- docs (en + zh): note that "exact" is always active and must be in layers
- tests: update TEST 9+10 for new ensure_index signature; add TEST 19 (schema rejects
  ["semantic"]-only); add TEST 20 (custom index="myidx" MISS→HIT + L2 key prefix check)
Cache hits now report which layer served them via X-AI-Cache-Status:
HIT-L1 (exact) or HIT-L2 (semantic). The en/zh docs and the exact-cache
suite (t/plugin/ai-cache.t) are updated to match.

Rework t/plugin/ai-cache-semantic.t into the codebase's declarative
test-nginx idiom: granular schema/unit blocks, and end-to-end flows split
into route-setup + request blocks (--- request / --- response_headers /
--- wait) over the shared :1980 X-AI-Fixture upstream.

Drive the embedding mock from real text-embedding-3-small responses
(captured at dimensions=64) stored as t/fixtures/openai/embeddings-*.json;
the mock logic lives in t/lib/ai_cache_mock.lua. Schema-rejection tests
assert the exact validation error via --- response_body eval.
@janiussyafiq janiussyafiq marked this pull request as ready for review June 30, 2026 07:21
@dosubot dosubot Bot added size:XXL This PR changes 1000+ lines, ignoring generated files. enhancement New feature or request labels Jun 30, 2026

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an opt-in semantic (L2) caching layer to the existing ai-cache plugin. On an L1 miss, the plugin can embed the prompt and query a Redis Stack (RediSearch) vector index for a sufficiently similar prior response, serving it as an L2 hit and backfilling L1; defaults remain exact-only.

Changes:

  • Implement semantic (L2) cache lookup/write path backed by Redis Stack RediSearch and embedding providers (OpenAI, Azure OpenAI).
  • Update ai-cache runtime behavior and headers to report HIT-L1 / HIT-L2 (plus similarity header on L2 hits).
  • Add comprehensive unit/e2e tests, fixtures, and docs for semantic caching configuration and behavior.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
t/plugin/ai-cache.t Updates existing tests to expect HIT-L1 status on exact cache hits.
t/plugin/ai-cache-semantic.t Adds new test suite covering schema validation, embedding drivers, vector search, isolation behavior, and fail-open semantics.
t/lib/ai_cache_mock.lua Adds a mock embeddings upstream that replays captured embedding fixtures and validates auth behavior.
t/fixtures/openai/embeddings-capital.json Adds captured embedding fixture used by semantic cache tests.
t/fixtures/openai/embeddings-capital-city.json Adds captured embedding fixture used by semantic cache tests.
t/fixtures/openai/embeddings-largest-city.json Adds captured embedding fixture used by semantic cache tests.
t/fixtures/openai/embeddings-tire.json Adds captured embedding fixture used by semantic cache tests.
Makefile Installs new ai-cache submodules for embeddings and vector-search at build/install time.
docs/en/latest/plugins/ai-cache.md Documents semantic caching (Redis Stack requirement, config fields, headers, and examples).
docs/zh/latest/plugins/ai-cache.md Chinese documentation updates for semantic caching configuration and examples.
apisix/plugins/ai-cache/vector-search/redis.lua Adds Redis Stack (RediSearch) helpers: index creation, upsert, and KNN search with float32 packing.
apisix/plugins/ai-cache/semantic.lua Implements L2 semantic cache: prompt extraction, embedding, partitioning, lookup, L1 backfill, and L2 write.
apisix/plugins/ai-cache/schema.lua Extends schema with layers and semantic configuration; adds encryption paths for embedding API keys.
apisix/plugins/ai-cache/key.lua Extends key logic to support semantic partitioning and context fingerprinting utilities.
apisix/plugins/ai-cache/embeddings/base.lua Shared HTTP fetch + response parsing for embeddings providers.
apisix/plugins/ai-cache/embeddings/openai.lua OpenAI embeddings driver (Bearer auth, optional endpoint/model/dimensions).
apisix/plugins/ai-cache/embeddings/azure_openai.lua Azure OpenAI embeddings driver (api-key header, endpoint required).
apisix/plugins/ai-cache.lua Integrates L2 lookup on L1 miss, adds new headers (HIT-L1/HIT-L2, similarity), and schedules L2 writes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread apisix/plugins/ai-cache/semantic.lua Outdated
Comment thread apisix/plugins/ai-cache/semantic.lua Outdated
Comment thread apisix/plugins/ai-cache.lua Outdated
Comment thread apisix/plugins/ai-cache/vector-search/redis.lua Outdated
@membphis

membphis commented Jul 1, 2026

Copy link
Copy Markdown
Member

One compatibility concern: this PR changes the existing exact-cache hit header from X-AI-Cache-Status: HIT to X-AI-Cache-Status: HIT-L1, even when the semantic layer is not enabled.

The -L1 suffix also feels a bit odd as a user-facing status, and this may break existing clients, tests, dashboards, or log parsing that already check for HIT. Could we keep the exact-cache status as HIT for backward compatibility, and only use a distinct value such as HIT-L2 when the semantic layer serves the response? Another option would be to add a separate cache-layer header if we need to expose the serving layer explicitly.

nic-6443
nic-6443 previously approved these changes Jul 1, 2026
@janiussyafiq

Copy link
Copy Markdown
Contributor Author

One compatibility concern: this PR changes the existing exact-cache hit header from X-AI-Cache-Status: HIT to X-AI-Cache-Status: HIT-L1, even when the semantic layer is not enabled.

The -L1 suffix also feels a bit odd as a user-facing status, and this may break existing clients, tests, dashboards, or log parsing that already check for HIT. Could we keep the exact-cache status as HIT for backward compatibility, and only use a distinct value such as HIT-L2 when the semantic layer serves the response? Another option would be to add a separate cache-layer header if we need to expose the serving layer explicitly.

@membphis The initial plan was to keep using HIT for both cases (L1 and L2). However, I realized that it is easier for debugging (for testing) and improve readability so we can exactly know which layer was getting hit. One thing to note is that special case for L2, a unique header will appear upon hit, X-AI-Cache-Similarity in comparison to L1. To answer your concern about breaking existing clients, tests, etc, since the current ai-cache version hasn't been released hence I think it is safe for the current change. WDYT?

@membphis

membphis commented Jul 1, 2026

Copy link
Copy Markdown
Member

Thanks, that makes sense if ai-cache has not been released yet.

One extra question: what is the expected user-facing distinction between L1 and L2 here? My understanding is that the two-layer design is mainly an internal optimization/correctness model: L1 is deterministic and cheap, while L2 is approximate and more expensive because it requires embedding plus vector search. Since an L2 hit already has X-AI-Cache-Similarity, do we need to expose HIT-L1 in X-AI-Cache-Status, or could we keep the simpler HIT for exact hits and use either HIT-L2 or a separate layer/similarity header only for semantic hits?

@janiussyafiq

Copy link
Copy Markdown
Contributor Author

Thanks, that makes sense if ai-cache has not been released yet.

One extra question: what is the expected user-facing distinction between L1 and L2 here? My understanding is that the two-layer design is mainly an internal optimization/correctness model: L1 is deterministic and cheap, while L2 is approximate and more expensive because it requires embedding plus vector search. Since an L2 hit already has X-AI-Cache-Similarity, do we need to expose HIT-L1 in X-AI-Cache-Status, or could we keep the simpler HIT for exact hits and use either HIT-L2 or a separate layer/similarity header only for semantic hits?

I believe it is a matter of design, both of your points are legit. I simply reverted back to design choice here in the pic (ref #13290):
image
I guess we can keep HIT for both to remain consistent with previous test cases.

@membphis membphis left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed, no blocking issues found.

@nic-6443 nic-6443 merged commit 7fd4515 into apache:master Jul 2, 2026
25 of 26 checks passed
@janiussyafiq janiussyafiq deleted the feat/ai-cache-semantic branch July 2, 2026 04:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants