-
Notifications
You must be signed in to change notification settings - Fork 2.6k
feat(consul): filter nodes in upstream with metadata #12448
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Hi @jizhuozhi, thanks for your contribution. I think it is useful for node filtering of Consul discovery. But I don't understand static upstream filtering. It seems that I need to mark metadata for each node in the upstream object, and then use metadata_match to configure filtering? Because each node is manually defined, if it is not needed, can I just add or delete the node? |
Here is a unified approach: I only determine whether there is a filtering rule, without distinguishing whether it is a service discovery or static list. However, according to my previous experience as a gateway administrator, there will be corresponding business developers who temporarily add rules for some debugging considerations but do not want to change the original instance list (for quick adjustment) |
Thanks for your reply. Could you please describe the scenario in detail? Why can't the existing methods solve this problem? |
It is not a production environment, but it is common in the testing and verification phase. We need to specify specific instances frequently (for example, to capture flame graphs for performance analysis), but we need to add other instances back after deleting them, so we need to specify instances by filtering. In fact, we also matched according to the dynamic colored metadata when loading balancing but not predefine the routes, similar to https://github.com/kitex-contrib/loadbalance-tagging (I am also using lua to implement the same capabilities, but this is not within the scope of this discussion). |
I still have doubts about what is in Example Usage. Do you mean that if I need to adjust the nodes used, I don't need to change the content of the nodes list, but adjust metadata_match? |
Yes, just adjust metadata_match (but the discussion of this use case has been separated from this PR). For the runtime, it is a unified filtering rule for the service list that does not need to distinguish the source.
We are currently using Consul on kubernetes. When I was working in another company a few years ago, we were using cloud virtual machines (or EC2). The cloud platform did not provide an API interface, but we used scripts to synchronize static instance lists at regular intervals. At this time, the static list was also a kind of dynamic discovery. (why not filter in the script? Because we were lazy:) |
Hi @jizhuozhi, There is currently no modification to the upstream schema, which means that the current modifications in the upstream only serve consul. Is it more appropriate to put all these logics into the consul module? |
Hello, @Baoyuantop, thanks for your reply.
Not only consul, but also Eureka (which has already supported metadata in apisix) will inherit this function. In my forked dashboard has already supported configuring metadata_match for Consul and Eureka (The examples in the PR description are just examples, because this allows testing without the registry, and we don't need to care about service discovery or static nodes.) And the current discovery package is responsible for pulling all instances, and the filtering in discovery is effective for all service names and upstreams, it means that I can only configure general filtering rules, but cannot configure differentiated matching for different routes and upstreams. This is our current online effect ![]() |
Hello @Baoyuantop , I see. Currently, upstream has passed the discovery args to nodes, so the loop can be closed in discovery. I will modify it. local new_nodes, err = dis.nodes(up_conf.service_name, up_conf.discovery_args)
if not new_nodes then
return HTTP_CODE_UPSTREAM_UNAVAILABLE, "no valid upstream node: " .. (err or "nil")
end |
Hello @Baoyuantop , PTAL, thanks :) We also have Spring Cloud applications with Eureka, but I have no time to write test case now, so I will create a new PR for Eureka later. |
Hi @jizhuozhi, we are still discussing whether to accept the feature of this PR, and we need to reach a consensus before we can start the review. Since there is no separate issue to discuss this issue, you need to clearly tell the maintainer what this feature does and why it is needed in the PR description (the current description already exists).
This is inappropriate and you need to replace it with an example from a real scenario. The current example will confuse other maintainers. In the latest changes, I see that you have cancelled the upstream related code. The current PR seems to focus on the filtering of consul services. Please update the PR description to reflect this. Thanks again for your contribution. |
Thank you for your reminder, the PR content has been updated |
Please fix the failed CI. |
t/discovery/consul_dump.t
Outdated
@@ -95,7 +95,7 @@ discovery: | |||
--- request | |||
GET /t | |||
--- response_body | |||
{"service_a":[{"host":"127.0.0.1","port":30511,"weight":1}],"service_b":[{"host":"127.0.0.1","port":8002,"weight":1}]} | |||
{"service_a":[{"host":"127.0.0.1","metadata":{"service_a_version":"4.0"},"port":30511,"weight":1}],"service_b":[{"host":"127.0.0.1","metadata":{"service_b_version":"4.1"},"port":8002,"weight":1}]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does it affect this place? I don't think this test should be changed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are now relying on metadata, we need to persist it when persisting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jizhuozhi Ok. Another question is, if this is added, do we now lack a test case for the situation where there is "no metadata"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add it :)
Thanks for your reminder, I will pay attention to it. The previous problem here was because of cross-environment submission (I testing in EC2 and coding in office computer), resulting in multiple different and invalid submitters. This situation will not occur in the future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Others LGTM
Pls complete these two things, and then we can ask other maintainers to review it:
- add no metadata test case
- fix ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces metadata-based node filtering for Consul service discovery in APISIX, enabling users to route traffic to specific service instance subsets based on metadata criteria. This supports use cases like canary releases and swimlane routing.
Key changes include:
- Adding metadata support to Consul discovery by including
Service.Meta
in node definitions - Implementing a metadata filtering mechanism through
discovery_args.metadata_match
configuration - Creating shared utility functions for metadata matching across discovery types
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 8 comments.
Show a summary per file
File | Description |
---|---|
apisix/discovery/consul/init.lua | Core implementation adding metadata support and filtering to Consul discovery |
apisix/utils/discovery.lua | New utility module providing metadata matching functions |
apisix/schema_def.lua | Schema definition for metadata_match configuration in discovery_args |
docs/en/latest/discovery/consul.md | Documentation for the new metadata filtering feature |
t/discovery/consul.t | Test case verifying metadata filtering functionality |
t/discovery/consul_dump.t | Updated test expectations to include metadata in responses |
t/discovery/consul2.t | Updated test expectations to include metadata in responses |
Hello @SkyeYoung. Please help try again, it seems to be unrelated to this change |
apisix/discovery/consul/init.lua
Outdated
local metadata = service.Meta | ||
-- ensure that metadata is an accessible table, | ||
-- avoid userdata likes `null` returned by cjson | ||
if type(metadata) ~= "table" then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls take a look, skip current service if the metadat
is invalid
if type(metadata) == "cdata" then
metadata = nil
elseif type(metadata) ~= "table" then
core.log.error("wrong meta data, ...", ...)
goto CONTINUE
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @membphis. I will add cdata check and error log, but I am not sure whether it is appropriate to skip this node.
If it is a container template release, it means that all the containers released in this batch will be skipped, and users only need to pay attention to whether it is valid when they need to use metadata. If it is skipped directly, the service quality will be affected. So I tend to keep these nodes, print logs and leave them empty.
@jizhuozhi Please fix the errors reported in the CI. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Description
This PR introduces metadata-based node filtering for consul discovery, supporting Consul service discovery based upstreams
Motivation
Currently, APISIX selects upstream nodes based on service name from discovery without additional filtering logic. In real-world scenarios like canary release or swimlane routing, users often tag backend instances with custom metadata (e.g.,
version
,env
,lane
,dc
, and etc) and expect the gateway to route only to specific subsets.This change allows users to define a
metadata_match
field indiscovery_args
configuration, which filters nodes before load balancing based on their metadata values.Changes
Service.Meta
in the node definition and respect itsweight
if available (aligned with Eureka).discovery_args
when fetch upstream nodes.Tests: Add test cases to cover both:
metadata_match
Example Usage
Only nodes with
metadata.lane in [prod, canary] and metadata.dc in [us-east-1, us-east-2]
will be used for load balancing.Fixes
Fixes #12464
Checklist