-
-
Notifications
You must be signed in to change notification settings - Fork 32.2k
Description
Version
v23.2.0 (also reproduces on v18.20.4, v20.17.0)
Platform
Darwin UA-KJP26G976P 23.4.0 Darwin Kernel Version 23.4.0: Wed Feb 21 21:44:54 PST 2024; root:xnu-10063.101.15~2/RELEASE_ARM64_T6030 arm64
Subsystem
http2
What steps will reproduce the bug?
Start a simple server that imitates a graceful shutdown by sending GOAWAY every N requests.
const http2 = require('http2');
const server = http2.createServer({
peerMaxConcurrentStreams: 100000,
settings: {
maxConcurrentStreams: 100000,
},
maxSessionMemory: 200,
});
const N = 97;
let count = 0;
const maxLastStreamId = 2 ** 31 - 1;
server.on('stream', (stream) => {
count++;
stream.respond({
'content-type': 'text/html',
':status': 200
});
stream.end('Hello World ' + count);
if (count % N === 0) {
console.log('sending goaway frame at count:', count);
stream.session.goaway(0, maxLastStreamId);
}
stream.on('error', (err) => {
console.log('error:', err);
});
});
server.listen(8000);
then a client:
const http2 = require('http2');
const payload = Buffer.alloc(1024 * 10, 'a');
let client;
function connect() {
client = http2.connect('http://localhost:8000', {
maxSessionMemory: 200,
peerMaxConcurrentStreams: 100000,
settings: {
maxConcurrentStreams: 100000,
},
});
client.on('goaway', () => {
connect();
});
client.on('error', (err) => {
console.log('client received error:', err);
connect();
});
}
const errorStats = new Map();
function addError(err) {
const key = err.toString();
const count = errorStats.get(key) || 0;
errorStats.set(key, count + 1);
}
function dumpErrorStatistics() {
for (const [key, value] of errorStats) {
console.log('error:', key, 'count:', value);
}
errorStats.clear();
}
const MAX_IN_FLIGHT = 67;
function sendRequests() {
let inFlight = 0;
while (inFlight < MAX_IN_FLIGHT) {
try {
const stream = client.request({
':path': '/',
':method': 'POST',
'content-type': 'text/html',
});
inFlight++;
stream.on('error', (err) => {
addError(err);
});
stream.on('data', () => {});
stream.on('close', () => {});
stream.write(payload);
stream.end();
} catch (err) {
addError(err);
}
}
dumpErrorStatistics();
}
connect();
setInterval(sendRequests, 7);
How often does it reproduce? Is there a required condition?
Reproduces every few seconds on my machine. The specific values of parameters (the number of requests between GOAWAY frames, the request batch size, interval between batches) may require adjustment on another machine.
What is the expected behavior? Why is that the expected behavior?
The server is sending GOAWAY with last stream id = 2^31 - 1, which should allow completing the existing requests.
This comment states that the existing streams should complete successfully, while the pending streams should be cancelled.
I would expect to never see NGHTTP2_INTERNAL_ERROR
on the client in this situation.
Also note that the client immediately reconnects after getting a goaway
event, so it seems that the client doesn't break the http2
module contract.
What do you see instead?
Some client streams are closed with code NGHTTP2_INTERNAL_ERROR
, as seen in the client.js
output:
error: Error [ERR_HTTP2_STREAM_ERROR]: Stream closed with error code NGHTTP2_INTERNAL_ERROR count: 67
error: Error [ERR_HTTP2_STREAM_ERROR]: Stream closed with error code NGHTTP2_INTERNAL_ERROR count: 67
error: Error [ERR_HTTP2_STREAM_ERROR]: Stream closed with error code NGHTTP2_INTERNAL_ERROR count: 67
If we run the client with NODE_DEBUG=http2
, we can see an attempt to send headers of the new stream after goaway was already processed:
HTTP2 1538: Http2Session client: goaway 0 received [last stream id: 2147483647]
HTTP2 1538: Http2Session client: created
HTTP2 1538: Http2Session client: marking session closed
HTTP2 1538: Http2Session client: submitting goaway
...
HTTP2 1538: Http2Session client: error sending frame type 1 on stream 403, code: 2
HTTP2 1538: Http2Stream 403 [Http2Session client]: closed with code 7, closed true, readable true
HTTP2 1538: Http2Stream 403 [Http2Session client]: destroying stream
HTTP2 1538: Http2Session client: error sending frame type 1 on stream 405, code: 2
The main issue is the client code receiving internal error
instead of the cancelled
error, making it impossible to retry the request in general case.
Another artifact I'm seeing is the client attempts sending a RST_STREAM
with code 7 (REFUSED_STREAM) to server although the stream is initiated on client and server has never seen this stream. We verified this by looking at the traffic dump, the client is actually sending RST_STREAM
for a new(!) client stream to server after getting a GOAWAY
frame.
Additional information
This is a synthetic minimal reproduction of a real bug we're seeing in production when the gRPC client cannot handle a graceful shutdown of a connection to Envoy proxy.
According to grpc-node
maintainer, the error code returned by node is too general for gRPC client to handle it gracefully: grpc/grpc-node#2625 (comment)