-
Notifications
You must be signed in to change notification settings - Fork 13.2k
Open
Labels
contribution welcomekind/bugCategory issues or prs related to bug.Category issues or prs related to bug.
Description
Describe the bug
When a gRPC subscribe service request contains invalid parameters (e.g., empty serviceName), the server returns error code 500 (FAIL) instead of a distinguishable client error code. This causes serious issues in SDK clients:
- SDK cannot distinguish between client parameter errors (should not retry) and server errors (can retry)
- Infinite retry loop: SDK keeps retrying the invalid request, exhausting retry attempts
- Client becomes UNHEALTHY: After retry exhaustion, the entire RPC client status changes to UNHEALTHY
- Cascading failures: Other valid service subscriptions fail because the client is UNHEALTHY
- Frequent reconnections: The unhealthy client triggers frequent reconnect attempts
Related Issue: This is causing production issues as reported in nacos-sdk-go: #868
Expected behavior
The gRPC API should have at least 3 distinguishable response codes:
- 200 - Success (current: SUCCESS)
- 400 - Client error / Invalid parameter (currently missing, returns 500)
- 500 - Server internal error (current: FAIL)
When serviceName is blank:
- Server should return a client error code (e.g., 400) instead of 500
- SDK should NOT retry client parameter errors
- SDK should return error to caller immediately without affecting client health status
Actually behavior
Current ResponseCode enum only has 2 values:
public enum ResponseCode {
SUCCESS(200, "Response ok"),
FAIL(500, "Response fail"); // Used for ALL failures}
When parameter validation fails:
- Server catches IllegalArgumentException in GrpcRequestAcceptor
- ErrorResponse.build(Throwable) returns ResponseCode.FAIL (500) for all non-Nacos exceptions
- SDK receives error code 500
- SDK treats it as server error and retries (default 3 times)
- After retries exhausted: Request retry exhausted, status changed: RUNNING -> UNHEALTHY
- Client becomes UNHEALTHY, affecting all other operations
How to Reproduce
- Use nacos-sdk-go v2.3.1 or nacos-client (Java) 2.4.3
- Subscribe to a service with empty serviceName:
namingClient.Subscribe(vo.SubscribeParam{
ServiceName: "", // Empty serviceName
GroupName: "group-a",
Clusters: []string{"cluster-a"},
SubscribeCallback: callback,
})
- Observe server logs:
java.lang.IllegalArgumentException: Param 'serviceName' is illegal, serviceName is blank
Observe client logs showing infinite retry and status change to UNHEALTHY:
Send request fail, retryTimes=1, error=Param 'serviceName' is illegal, serviceName is blank
Send request fail, retryTimes=2, error=Param 'serviceName' is illegal, serviceName is blank
Send request fail, retryTimes=3, error=Param 'serviceName' is illegal, serviceName is blank
Request retry exhausted, status changed: RUNNING -> UNHEALTHY
Stack Trace:
java.lang.IllegalArgumentException: Param 'serviceName' is illegal, serviceName is blank
at com.alibaba.nacos.api.naming.utils.NamingUtils.getGroupedName(NamingUtils.java:61)
at com.alibaba.nacos.naming.remote.rpc.handler.SubscribeServiceRequestHandler.handle(SubscribeServiceRequestHandler.java:76)
at com.alibaba.nacos.core.remote.RequestHandler.handleRequest(RequestHandler.java:58)
at com.alibaba.nacos.core.remote.grpc.GrpcRequestAcceptor.request(GrpcRequestAcceptor.java:195)
Desktop (please complete the following information):
- OS: Linux
- Version:
- nacos-server: 2.4.3
- nacos-client (Java): 2.4.3
- nacos-sdk-go: v2.3.1
Module: naming (gRPC)
Impact: Production Critical - causes client instability and cascading failures
Root Cause Analysis
- Limited ResponseCode enum - Only SUCCESS(200) and FAIL(500), no client error code
- ErrorResponse.build() treats all non-Nacos exceptions as 500:
public static Response build(Throwable exception) {
int errorCode;
if (exception instanceof NacosException) {
errorCode = ((NacosException) exception).getErrCode();
} else if (exception instanceof NacosRuntimeException) {
errorCode = ((NacosRuntimeException) exception).getErrCode();
} else {
errorCode = ResponseCode.FAIL.getCode(); // Always 500!
}
// ...
}
- No parameter validation before calling NamingUtils.getGroupedName()
- SDK retry logic cannot distinguish retryable vs non-retryable errors
Metadata
Metadata
Assignees
Labels
contribution welcomekind/bugCategory issues or prs related to bug.Category issues or prs related to bug.