Skip to content

[Bug][Naming] Parameter validation errors incorrectly return 500 (Internal Server Error) in gRPC API #14094

@xiaomudk

Description

@xiaomudk

Describe the bug
When a gRPC subscribe service request contains invalid parameters (e.g., empty serviceName), the server returns error code 500 (FAIL) instead of a distinguishable client error code. This causes serious issues in SDK clients:

  1. SDK cannot distinguish between client parameter errors (should not retry) and server errors (can retry)
  2. Infinite retry loop: SDK keeps retrying the invalid request, exhausting retry attempts
  3. Client becomes UNHEALTHY: After retry exhaustion, the entire RPC client status changes to UNHEALTHY
  4. Cascading failures: Other valid service subscriptions fail because the client is UNHEALTHY
  5. Frequent reconnections: The unhealthy client triggers frequent reconnect attempts

Related Issue: This is causing production issues as reported in nacos-sdk-go: #868

Expected behavior
The gRPC API should have at least 3 distinguishable response codes:

  • 200 - Success (current: SUCCESS)
  • 400 - Client error / Invalid parameter (currently missing, returns 500)
  • 500 - Server internal error (current: FAIL)

When serviceName is blank:

  • Server should return a client error code (e.g., 400) instead of 500
  • SDK should NOT retry client parameter errors
  • SDK should return error to caller immediately without affecting client health status

Actually behavior
Current ResponseCode enum only has 2 values:

public enum ResponseCode { 
       SUCCESS(200, "Response ok"), 
       FAIL(500, "Response fail");  // Used for ALL failures}

When parameter validation fails:

  1. Server catches IllegalArgumentException in GrpcRequestAcceptor
  2. ErrorResponse.build(Throwable) returns ResponseCode.FAIL (500) for all non-Nacos exceptions
  3. SDK receives error code 500
  4. SDK treats it as server error and retries (default 3 times)
  5. After retries exhausted: Request retry exhausted, status changed: RUNNING -> UNHEALTHY
  6. Client becomes UNHEALTHY, affecting all other operations

How to Reproduce

  1. Use nacos-sdk-go v2.3.1 or nacos-client (Java) 2.4.3
  2. Subscribe to a service with empty serviceName:
   namingClient.Subscribe(vo.SubscribeParam{
       ServiceName: "",           // Empty serviceName
       GroupName:   "group-a",
       Clusters:    []string{"cluster-a"},
       SubscribeCallback: callback,
   })
  1. Observe server logs:
   java.lang.IllegalArgumentException: Param 'serviceName' is illegal, serviceName is blank

Observe client logs showing infinite retry and status change to UNHEALTHY:

   Send request fail, retryTimes=1, error=Param 'serviceName' is illegal, serviceName is blank
   Send request fail, retryTimes=2, error=Param 'serviceName' is illegal, serviceName is blank
   Send request fail, retryTimes=3, error=Param 'serviceName' is illegal, serviceName is blank
   Request retry exhausted, status changed: RUNNING -> UNHEALTHY

Stack Trace:

java.lang.IllegalArgumentException: Param 'serviceName' is illegal, serviceName is blank
    at com.alibaba.nacos.api.naming.utils.NamingUtils.getGroupedName(NamingUtils.java:61)
    at com.alibaba.nacos.naming.remote.rpc.handler.SubscribeServiceRequestHandler.handle(SubscribeServiceRequestHandler.java:76)
    at com.alibaba.nacos.core.remote.RequestHandler.handleRequest(RequestHandler.java:58)
    at com.alibaba.nacos.core.remote.grpc.GrpcRequestAcceptor.request(GrpcRequestAcceptor.java:195)

Desktop (please complete the following information):

  • OS: Linux
  • Version:
    • nacos-server: 2.4.3
    • nacos-client (Java): 2.4.3
    • nacos-sdk-go: v2.3.1
      Module: naming (gRPC)
      Impact: Production Critical - causes client instability and cascading failures

Root Cause Analysis

  1. Limited ResponseCode enum - Only SUCCESS(200) and FAIL(500), no client error code
  2. ErrorResponse.build() treats all non-Nacos exceptions as 500:
   public static Response build(Throwable exception) {
       int errorCode;
       if (exception instanceof NacosException) {
           errorCode = ((NacosException) exception).getErrCode();
       } else if (exception instanceof NacosRuntimeException) {
           errorCode = ((NacosRuntimeException) exception).getErrCode();
       } else {
           errorCode = ResponseCode.FAIL.getCode(); // Always 500!
       }
       // ...
   }
  1. No parameter validation before calling NamingUtils.getGroupedName()
  2. SDK retry logic cannot distinguish retryable vs non-retryable errors

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions