Description
Version
14.18.1
Platform
IBM i 7.4
Subsystem
http
What steps will reproduce the bug?
The HTTP server can suddenly stop responding to any and all requests. When the server gets into this state it will remain listening and accepting client connections but will stop running the requestListener
callback, so clients will suddenly stop getting responses to any request. When the server gets into this state, there is no output at stdout/stderr and the behavior persists for the duration of the process.
The problem is intermittent and it's not clear what causes it, but it seems to be triggered by certain network activity. The only way I can reproduce it on demand is by running a network vulnerability scan against the server using Nessus Essentials, and having the server running in multiple processes using cluster
.
To be clear, I have seen the problem occur many times during normal use of the HTTP server without any network scans taking place. I have also seen it happen without cluster
in play. This is just the only way I have found to reliably reproduce it.
To reproduce, run this simple server on IBM i:
if (cluster.isMaster) {
cluster.fork();
}
else {
const server = http.createServer((req, res) => {
console.log(new Date(), "request received");
res.writeHead(
200,
{
"Content-Type": "text/plain",
"Cache-Control": "no-store"
}
);
res.end("OK");
});
server.on("listening", () => {
console.log("Listening on port", PORT);
});
server.listen(PORT);
}
Then run a Basic Network Scan using Nessus Essentials:
https://www.tenable.com/downloads/nessus
I've been using Nessus 8.15.2 and running it on Windows 10. To setup the scan:
-
Click "New Scan"
-
Choose the Basic Network Scan
- Enter your Node HTTP server IP under Targets
- Click the Discovery link on the left, and set the Scan Type to Custom.
- Click on the Port Scanning link and set the Port Scan Range to the Node HTTP server port
- Click on the Host Discovery link and set Ping the Remote Host to OFF
-
Leave all other settings at default
-
Launch scan and wait for it to complete
Once the scan is complete, the server will stop responding to any requests. The requestListener
callback won't run, as shown by lack of console output, and the client will never get a response. This behavior will persist for the duration of the server process.
How often does it reproduce? Is there a required condition?
The steps above will reproduce the problem every time.
What is the expected behavior?
The server should continue responding to requests.
What do you see instead?
The server inexplicably stops responding to requests for the duration of the process.
Additional information
I have been able to reproduce this problem reliably on IBM i 7.4 and 7.2. I haven't yet tried on 7.3, but I have had users of my package report what I think is the same problem on IBM i 7.3. I have never seen this happen with Node running on other platforms.
IBM i NETSTAT reports everything normal while the server is in the bad state. For example, if I run this query while trying some requests:
select
tcp_state, count(tcp_state)
from
qsys2.netstat_info
where
local_port = 8080
group by
tcp_state
order by
tcp_state
I get this output:
TCP COUNT ( TCP_STATE )
State
ESTABLISHED 2
LISTEN 2
Running a WireShark trace on the client side while using Chrome to make a request also looks normal. The TCP 3-way connection handshake goes normally, and the server also ACKs the HTTP request frame. Then Chrome just waits and waits for the response that never comes. Meanwhile it sends TCP Keep Alive probes to the server, and the server ACKs them as expected.
When the server process is in this state, the IBM i active job status and callstack look normal. The active job status is SELW (select wait) and the callstack shows the process is waiting on I/O via poll()
call. This state is identical to when the server is responding to requests normally. Also, there is nothing in the IBM i job log.
This is a serious stability issue for Node.js on IBM i. As I mentioned above, I have seen this happen without any network scans going on, and without cluster
in play. Use of cluster
seems to exacerbate the problem.