Skip to content

NettyNioSocketChannel leaks RequestContext objects #6307

@RazakDangi

Description

@RazakDangi

Describe the bug

Unbounded memory and socket retention due to AWS SDK Netty NioSocketChannel leak under high-concurrency S3 async operations.

Description:
When performing high-throughput, parallel S3 object locking operations using the AWS SDK v2 (with async Netty client), the application experienced:

Heap memory growth from retained Netty Channel.attr(...) state (e.g. RequestContext, Metrics, etc.).

Socket exhaustion and eventual TCP Zero Window scenarios.

Thread explosion under error or retry paths due to unconsumed CompletableFuture chains in Reactor.

Persistent CLOSE_WAIT / ESTABLISHED sockets, even after all logical requests completed.

Regression Issue

  • Select this option if this issue appears to be a regression.

Expected Behavior

Given the use of the AWS SDK v2's NettyNioAsyncHttpClient, we expected:

At most X concurrent NioSocketChannel instances, where X = maxConcurrency set on the async HTTP client.

Each Channel to be fully released back to the pool after the request completes, with:

All Channel.attr(...) cleared (e.g., RequestContext, ExecutionId, etc.).

No dangling user-defined or internal AWS SDK metadata.

Current Behavior

Unbounded NioSocketChannel accumulation:

Far more channels than maxConcurrency were created and retained.

Old channels were not cleaned or released, even after request completion.

TCP Port Exhaustion:

Persistent ESTABLISHED and CLOSE_WAIT states observed on client side.

System unable to open new connections, leading to zero window TCP behavior.

Heap Retention and GC Pressure:

Channels held reference chains via Channel.attr() (e.g., RequestContext, SdkHttpRequest, ExecutionId).

Reproduction Steps

You can reproduce the issue by configuring a high-throughput AWS S3 async client using the AWS SDK v2 with NettyNioAsyncHttpClient. This snippet simulates many concurrent headObject or putObjectRetention calls and reveals NioSocketChannel leakage.
`SdkAsyncHttpClient httpClient = NettyNioAsyncHttpClient.builder()
.maxConcurrency(64) // Expected limit
.readTimeout(Duration.ofSeconds(10))
.writeTimeout(Duration.ofSeconds(10))
.connectionAcquisitionTimeout(Duration.ofSeconds(15))
.build();

S3AsyncClient s3Client = S3AsyncClient.builder()
.httpClient(httpClient)
.region(Region.US_EAST_1)
.build();`

ExecutorService executor = Executors.newFixedThreadPool(100); for (int i = 0; i < 10000; i++) { int id = i; executor.submit(() -> { s3Client.headObject(HeadObjectRequest.builder() .bucket("your-bucket") .key("test-object-" + id) .build()) .whenComplete((resp, err) -> { if (err != null) System.err.println("Failed: " + err); }); }); }

Monitoring Tools:

lsof -iTCP -sTCP:ESTABLISHED | grep java

netstat -an | grep '.443 ' | grep ESTABLISHED | wc -l

VisualVM / JFR to inspect heap and NioSocketChannel retention

Possible Solution

NettyRequestExecutor configureChannel and puts requestContext inside channel attr , this lets to many completed requestcontext objects not being gced , RequestContext has references to executor
Channels are offered for reuse and they kept in idle also have references to completed requestContext

At a call back createExecutionFuture you call close without cleanning up channel attrs
` if (ch.attr(IN_USE) != null && ch.attr(IN_USE).get() && executionIdKey != null) {
ch.pipeline().fireExceptionCaught(new FutureCancelledException(this.executionId, t));
} else {
// here you can clean up attr set for this call
cleanupChannelAttributes(ch);
ch.close().addListener(closeFuture -> context.channelPool().release(ch));

                    }`
private void cleanupChannelAttributes(Channel ch) {
       if (ch == null) return;

       Long currentExecutionId = ch.attr(EXECUTION_ID_KEY).get();
       RequestContext currentContext = ch.attr(REQUEST_CONTEXT_KEY).get();

       boolean sameExecution = Objects.equals(currentExecutionId, this.executionId);
       boolean sameContext = Objects.equals(currentContext, this.context);

       if (sameExecution || sameContext) {
           ch.attr(REQUEST_CONTEXT_KEY).set(null);
           ch.attr(EXECUTE_FUTURE_KEY).set(null);
           ch.attr(RESPONSE_COMPLETE_KEY).set(null);
           ch.attr(STREAMING_COMPLETE_KEY).set(null);
           ch.attr(RESPONSE_CONTENT_LENGTH).set(null);
           ch.attr(RESPONSE_DATA_READ).set(null);
           ch.attr(EXECUTION_ID_KEY).set(null); // optional
           log.trace(ch, () -> "Cleaned channel attributes for executionId=" + executionId);
       } else {
           log.trace(ch, () -> "Skipped cleanup: channel reused by another request (executionId=" + currentExecutionId + ")");
       }
   }

NettyRequestExecutor has global/class level references
Problem class and code
` private CompletableFuture executeFuture;
private Channel channel;
private RequestAdapter requestAdapter;

public NettyRequestExecutor(RequestContext context) {
    this.context = context;
}

@SuppressWarnings("unchecked")
public CompletableFuture<Void> execute() {
    Promise<Channel> channelFuture = context.eventLoopGroup().next().newPromise();
    executeFuture = createExecutionFuture(channelFuture);
    acquireChannel(channelFuture);
    channelFuture.addListener((GenericFutureListener) this::makeRequestListener);
    return executeFuture;
}`

Before the fix we can see 2130 open NioSocketChannel objects for 200 max concurrency setting.

Image

After the fix inside aws sdk http client , we see max 200 NioSocketChannel objects which matches the max Concurrency we set

Image

Here is the fix that worked and stoped the niosocket channel leak

  1. remove class level references
    private CompletableFuture executeFuture;
    private Channel channel;
    private RequestAdapter requestAdapter;

  2. move to method level and pass arround to private methods

public CompletableFuture execute() {
Promise channelFuture = context.eventLoopGroup().next().newPromise();
CompletableFuture executeFuture = createExecutionFuture(channelFuture);
acquireChannel(channelFuture);
channelFuture.addListener((GenericFutureListener) ->{
if (channelFuture.isSuccess()) {
Channel channel = channelFuture.getNow();
NettyUtils.doInEventLoop(channel.eventLoop(), () -> {
try {
configureChannel(channel, executeFuture);
RequestAdapter adapter = configurePipeline(channel);
makeRequest(channel, adapter, executeFuture);
} catch (Throwable t) {
closeAndRelease(channel);
handleFailure(channel, () -> "Failed to initiate request to " + endpoint(), t, executeFuture);
}
});
} else {
handleFailure(null, () -> "Failed to create connection to " + endpoint(), channelFuture.cause(), executeFuture);
}
});
return executeFuture;
}


private void configureChannel(Channel channel, CompletableFuture<Void> future) 

private RequestAdapter configurePipeline(Channel channel) throws IOException

private void makeRequest(Channel channel, RequestAdapter adapter, CompletableFuture executeFuture) {
HttpRequest request = adapter.adapt(context.executeRequest().request());
writeRequest(channel, request, executeFuture);
}


with this we dont see to many SocketChannel objtects
    

### Additional Information/Context

if solution is valid and safe I am happy to create a fix PR 

### AWS Java SDK version used

2.29.6

### JDK version used

sdkman/candidates/java/21.0.6-tem/bin/java

### Operating System and version

linux and osx

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugThis issue is a bug.needs-triageThis issue or PR still needs to be triaged.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions