Skip to content

feat(consul): filter nodes in upstream with metadata #12448

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

jizhuozhi
Copy link

@jizhuozhi jizhuozhi commented Jul 20, 2025

Description

This PR introduces metadata-based node filtering for consul discovery, supporting Consul service discovery based upstreams

Motivation

Currently, APISIX selects upstream nodes based on service name from discovery without additional filtering logic. In real-world scenarios like canary release or swimlane routing, users often tag backend instances with custom metadata (e.g., version, env, lane, dc, and etc) and expect the gateway to route only to specific subsets.

This change allows users to define a metadata_match field in discovery_args configuration, which filters nodes before load balancing based on their metadata values.

Changes

  • Consul discovery:
  • Include Service.Meta in the node definition and respect its weight if available (aligned with Eureka).
  • Filter with discovery_args when fetch upstream nodes.
  • Tests: Add test cases to cover both:

    • discovery-based upstream (Consul) with metadata_match

Example Usage

upstream:
  type: roundrobin
  scheme: http
  discovery_type: eureka
  discovery_args:
    metadata_match:
      lane:
      - prod
      - canary
      dc:
      - us-east-1
      - us-east-2

Only nodes with metadata.lane in [prod, canary] and metadata.dc in [us-east-1, us-east-2] will be used for load balancing.


Fixes

Fixes #12464


Checklist

  • I have explained the need for this PR and the problem it solves
  • I have explained the changes or the new features added to this PR
  • I have added tests corresponding to this change
  • I have updated the documentation to reflect this change
  • I have verified that this change is backward compatible (no breaking changes)

@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. enhancement New feature or request labels Jul 20, 2025
@Baoyuantop
Copy link
Contributor

Hi @jizhuozhi, thanks for your contribution.

I think it is useful for node filtering of Consul discovery. But I don't understand static upstream filtering. It seems that I need to mark metadata for each node in the upstream object, and then use metadata_match to configure filtering? Because each node is manually defined, if it is not needed, can I just add or delete the node?

@jizhuozhi
Copy link
Author

Hi @jizhuozhi, thanks for your contribution.

I think it is useful for node filtering of Consul discovery. But I don't understand static upstream filtering. It seems that I need to mark metadata for each node in the upstream object, and then use metadata_match to configure filtering? Because each node is manually defined, if it is not needed, can I just add or delete the node?

Here is a unified approach: I only determine whether there is a filtering rule, without distinguishing whether it is a service discovery or static list.

However, according to my previous experience as a gateway administrator, there will be corresponding business developers who temporarily add rules for some debugging considerations but do not want to change the original instance list (for quick adjustment)

@Baoyuantop
Copy link
Contributor

there will be corresponding business developers who temporarily add rules for some debugging considerations but do not want to change the original instance list

Thanks for your reply. Could you please describe the scenario in detail? Why can't the existing methods solve this problem?

@moonming moonming requested a review from Copilot July 21, 2025 08:31
Copilot

This comment was marked as outdated.

@jizhuozhi
Copy link
Author

jizhuozhi commented Jul 21, 2025

Thanks for your reply. Could you please describe the scenario in detail? Why can't the existing methods solve this problem?

It is not a production environment, but it is common in the testing and verification phase. We need to specify specific instances frequently (for example, to capture flame graphs for performance analysis), but we need to add other instances back after deleting them, so we need to specify instances by filtering.

In fact, we also matched according to the dynamic colored metadata when loading balancing but not predefine the routes, similar to https://github.com/kitex-contrib/loadbalance-tagging (I am also using lua to implement the same capabilities, but this is not within the scope of this discussion).

@Baoyuantop
Copy link
Contributor

Baoyuantop commented Jul 22, 2025

Thanks for your reply. Could you please describe the scenario in detail? Why can't the existing methods solve this problem?

It is not a production environment, but it is common in the testing and verification phase. We need to specify specific instances frequently (for example, to capture flame graphs for performance analysis), but we need to add other instances back after deleting them, so we need to specify instances by filtering.

In fact, we also matched according to the dynamic colored metadata when loading balancing but not predefine the routes, similar to https://github.com/kitex-contrib/loadbalance-tagging (I am also using lua to implement the same capabilities, but this is not within the scope of this discussion).

I still have doubts about what is in Example Usage. Do you mean that if I need to adjust the nodes used, I don't need to change the content of the nodes list, but adjust metadata_match?
By the way, are you using static nodes or service discovery?

@jizhuozhi
Copy link
Author

jizhuozhi commented Jul 22, 2025

I still have doubts about what is in Example Usage. Do you mean that if I need to adjust the nodes used, I don't need to change the content of the nodes list, but adjust metadata_match?

Yes, just adjust metadata_match (but the discussion of this use case has been separated from this PR). For the runtime, it is a unified filtering rule for the service list that does not need to distinguish the source.

By the way, are you using static nodes or service discovery?

We are currently using Consul on kubernetes. When I was working in another company a few years ago, we were using cloud virtual machines (or EC2). The cloud platform did not provide an API interface, but we used scripts to synchronize static instance lists at regular intervals. At this time, the static list was also a kind of dynamic discovery. (why not filter in the script? Because we were lazy:)

@Baoyuantop
Copy link
Contributor

Hi @jizhuozhi, There is currently no modification to the upstream schema, which means that the current modifications in the upstream only serve consul. Is it more appropriate to put all these logics into the consul module?

@jizhuozhi
Copy link
Author

jizhuozhi commented Jul 25, 2025

Hello, @Baoyuantop, thanks for your reply.

Hi @jizhuozhi, There is currently no modification to the upstream schema, which means that the current modifications in the upstream only serve consul. Is it more appropriate to put all these logics into the consul module?

Not only consul, but also Eureka (which has already supported metadata in apisix) will inherit this function. In my forked dashboard has already supported configuring metadata_match for Consul and Eureka
https://github.com/jizhuozhi/apisix-dashboard/blob/9bd72c82e4fcbfa0d2bf34420280028c2ca853c8/web/src/components/Upstream/components/ServiceDiscovery.tsx#L30-L37

(The examples in the PR description are just examples, because this allows testing without the registry, and we don't need to care about service discovery or static nodes.)

And the current discovery package is responsible for pulling all instances, and the filtering in discovery is effective for all service names and upstreams, it means that I can only configure general filtering rules, but cannot configure differentiated matching for different routes and upstreams. This is our current online effect

image

@jizhuozhi
Copy link
Author

Hello @Baoyuantop , I see. Currently, upstream has passed the discovery args to nodes, so the loop can be closed in discovery. I will modify it.

        local new_nodes, err = dis.nodes(up_conf.service_name, up_conf.discovery_args)
        if not new_nodes then
            return HTTP_CODE_UPSTREAM_UNAVAILABLE, "no valid upstream node: " .. (err or "nil")
        end

@jizhuozhi jizhuozhi changed the title feat(upstream): filter nodes in upstream with metadata feat(consul): filter nodes in upstream with metadata Jul 25, 2025
@jizhuozhi
Copy link
Author

Hello @Baoyuantop , PTAL, thanks :)

We also have Spring Cloud applications with Eureka, but I have no time to write test case now, so I will create a new PR for Eureka later.

@Baoyuantop
Copy link
Contributor

Hi @jizhuozhi, we are still discussing whether to accept the feature of this PR, and we need to reach a consensus before we can start the review. Since there is no separate issue to discuss this issue, you need to clearly tell the maintainer what this feature does and why it is needed in the PR description (the current description already exists).

The examples in the PR description are just examples, because this allows testing without the registry, and we don't need to care about service discovery or static nodes.

This is inappropriate and you need to replace it with an example from a real scenario. The current example will confuse other maintainers. In the latest changes, I see that you have cancelled the upstream related code. The current PR seems to focus on the filtering of consul services. Please update the PR description to reflect this. Thanks again for your contribution.

@jizhuozhi
Copy link
Author

This is inappropriate and you need to replace it with an example from a real scenario. The current example will confuse other maintainers.

Thank you for your reminder, the PR content has been updated

@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Jul 27, 2025
@Baoyuantop
Copy link
Contributor

Please fix the failed CI.

@@ -95,7 +95,7 @@ discovery:
--- request
GET /t
--- response_body
{"service_a":[{"host":"127.0.0.1","port":30511,"weight":1}],"service_b":[{"host":"127.0.0.1","port":8002,"weight":1}]}
{"service_a":[{"host":"127.0.0.1","metadata":{"service_a_version":"4.0"},"port":30511,"weight":1}],"service_b":[{"host":"127.0.0.1","metadata":{"service_b_version":"4.1"},"port":8002,"weight":1}]}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does it affect this place? I don't think this test should be changed

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are now relying on metadata, we need to persist it when persisting.

Copy link
Member

@SkyeYoung SkyeYoung Aug 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jizhuozhi Ok. Another question is, if this is added, do we now lack a test case for the situation where there is "no metadata"?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add it :)

@Baoyuantop Baoyuantop added the wait for update wait for the author's response in this issue/PR label Aug 1, 2025
@jizhuozhi
Copy link
Author

@jizhuozhi

Please try not to use force-push. It forces me to review the entire PR from the beginning, rather than just the parts you've changed since my last review.

Thanks 😸

Thanks for your reminder, I will pay attention to it. The previous problem here was because of cross-environment submission (I testing in EC2 and coding in office computer), resulting in multiple different and invalid submitters. This situation will not occur in the future.

Copy link
Member

@SkyeYoung SkyeYoung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Others LGTM

Pls complete these two things, and then we can ask other maintainers to review it:

  1. add no metadata test case
  2. fix ci

@SkyeYoung SkyeYoung added user responded and removed wait for update wait for the author's response in this issue/PR labels Aug 5, 2025
@Baoyuantop Baoyuantop requested a review from SkyeYoung August 5, 2025 10:17
SkyeYoung
SkyeYoung previously approved these changes Aug 6, 2025
Copy link
Member

@SkyeYoung SkyeYoung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces metadata-based node filtering for Consul service discovery in APISIX, enabling users to route traffic to specific service instance subsets based on metadata criteria. This supports use cases like canary releases and swimlane routing.

Key changes include:

  • Adding metadata support to Consul discovery by including Service.Meta in node definitions
  • Implementing a metadata filtering mechanism through discovery_args.metadata_match configuration
  • Creating shared utility functions for metadata matching across discovery types

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
apisix/discovery/consul/init.lua Core implementation adding metadata support and filtering to Consul discovery
apisix/utils/discovery.lua New utility module providing metadata matching functions
apisix/schema_def.lua Schema definition for metadata_match configuration in discovery_args
docs/en/latest/discovery/consul.md Documentation for the new metadata filtering feature
t/discovery/consul.t Test case verifying metadata filtering functionality
t/discovery/consul_dump.t Updated test expectations to include metadata in responses
t/discovery/consul2.t Updated test expectations to include metadata in responses

@jizhuozhi
Copy link
Author

Hello @SkyeYoung. Please help try again, it seems to be unrelated to this change

SkyeYoung
SkyeYoung previously approved these changes Aug 8, 2025
@SkyeYoung SkyeYoung requested a review from AlinsRan August 8, 2025 01:20
local metadata = service.Meta
-- ensure that metadata is an accessible table,
-- avoid userdata likes `null` returned by cjson
if type(metadata) ~= "table" then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls take a look, skip current service if the metadat is invalid

if type(metadata) == "cdata" then
   metadata = nil
elseif type(metadata) ~= "table" then
   core.log.error("wrong meta data, ...", ...)
   goto CONTINUE
end

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @membphis. I will add cdata check and error log, but I am not sure whether it is appropriate to skip this node.

If it is a container template release, it means that all the containers released in this batch will be skipped, and users only need to pay attention to whether it is valid when they need to use metadata. If it is skipped directly, the service quality will be affected. So I tend to keep these nodes, print logs and leave them empty.

@SkyeYoung
Copy link
Member

@jizhuozhi Please fix the errors reported in the CI.

Copy link
Member

@SkyeYoung SkyeYoung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@SkyeYoung SkyeYoung requested a review from membphis August 14, 2025 06:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request size:L This PR changes 100-499 lines, ignoring generated files. user responded
Projects
Status: 👀 In review
Development

Successfully merging this pull request may close these issues.

feat: I hope we could filter nodes via metadata when config upstream
4 participants