Description
- Version: v14.16.0
- Platform: IBM i 7.2
- Subsystem: net
What steps will reproduce the bug?
I have only observed this problem on IBM i 7.2. I'm not sure if it happens on other platforms -- I haven't tried it.
Create a simple server program like this:
const http = require("http");
const server = http.createServer((req, res) => {
res.writeHead(200, { "Content-Type": "text/plain" })
res.end("OK");
});
server.on("connection", socket => {
socket.setKeepAlive(true, 60000);
});
server.listen(process.env["PORT"] || 8080);
Then use a simple client program like this to connect to it while running a packet capture, so that you can see when TCP keepalive probes are sent:
const net = require("net");
const socket = net.createConnection({
host: "your_IBMi",
port: process.env["PORT"] || 8080
});
The initial keepalive probe should be sent after the connection is idle for the amount of time passed to socket.setKeepAlive()
, but instead the first probe is sent after the system-wide default time given by the value TCPKEEPALV on IBM i's CHGTCPA command. Subsequent keepalive probes are then sent on the correct interval.
For example, if you set the system-wide default to 5 minutes using this command:
CHGTCPA TCPKEEPALV(5)
And then run the test above, the first keepalive will be sent after 5 minutes, and then subsequent keepalives will be sent after 1 minute.
Additional information
The problem relates to the order of the setsockopt()
calls used to enable TCP keepalive and to set the idle time. I've found that, on IBM i, coding like this produces the problem I'm describing here:
int enable = 1;
unsigned int tcpKeepIdle = 60;
setsockopt(client_fd, SOL_SOCKET, SO_KEEPALIVE, &enable, sizeof(enable));
setsockopt(client_fd, IPPROTO_TCP, TCP_KEEPIDLE, &tcpKeepIdle, sizeof(tcpKeepIdle));
However, if TCP_KEEPIDLE is set first, then it works as expected:
setsockopt(client_fd, IPPROTO_TCP, TCP_KEEPIDLE, &tcpKeepIdle, sizeof(tcpKeepIdle));
setsockopt(client_fd, SOL_SOCKET, SO_KEEPALIVE, &enable, sizeof(enable));
Reversing the setsockopt()
calls in this code would fix the problem: