Skip to content

net: defer self.destroy calls to nextTick #54857

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

anilhelvaci
Copy link

Fixes: #48771

What is the problem being solved?

#48771 Reported request object returned from http.request method cannot catch error events triggered when there’s an immediate failure trying to connect to an address returned from dns lookup.

Solution

#51038 implemented changes suggested in #48771 (comment). However the #51038 couldn’t be merged due to lack of tests(#51038 (review)). In this PR, I apply the same fix but with some tests.

Testing Considerations

All process.nextTick(() => self.destroy()) are hit except one. Below is the self.destroy() call that is not hit in these tests provided:

node/lib/net.js

Line 1113 in 9404d3a

self.destroy(new ERR_SOCKET_CONNECTION_TIMEOUT());

I am not tagging this PR as "DRAFT" since the piece of code that isn't tested is for a connection timeout case.

@mcollina Please let me know if these tests are sufficient or not.

@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/net

@nodejs-github-bot nodejs-github-bot added needs-ci PRs that need a full CI run. net Issues and PRs related to the net subsystem. labels Sep 9, 2024
Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@mcollina mcollina requested a review from ShogunPanda September 9, 2024 07:23
@mcollina mcollina added the request-ci Add this label to start a Jenkins CI on a PR. label Sep 9, 2024
Copy link

codecov bot commented Sep 9, 2024

Codecov Report

Attention: Patch coverage is 75.00000% with 1 line in your changes missing coverage. Please review.

Project coverage is 90.13%. Comparing base (0722023) to head (fcc49db).

Files with missing lines Patch % Lines
lib/net.js 75.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #54857      +/-   ##
==========================================
- Coverage   90.16%   90.13%   -0.03%     
==========================================
  Files         637      637              
  Lines      188126   188126              
  Branches    36886    36897      +11     
==========================================
- Hits       169620   169576      -44     
- Misses      11272    11290      +18     
- Partials     7234     7260      +26     
Files with missing lines Coverage Δ
lib/net.js 96.32% <75.00%> (+1.03%) ⬆️

... and 32 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

@ShogunPanda ShogunPanda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Member

@anonrig anonrig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice. thank you and congrats on your first contribution!

@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Sep 9, 2024
@nodejs-github-bot

This comment was marked as outdated.

@anilhelvaci
Copy link
Author

anilhelvaci commented Sep 10, 2024

nice. thank you and congrats on your first contribution!

Thank you, so excited to become a part of the family! @anonrig

What are the next steps then? Do I have to do anything to initiate the merge? I see that my branch is behind a few commits, should I rebase and push again?

@mcollina mcollina added the request-ci Add this label to start a Jenkins CI on a PR. label Sep 11, 2024
@mcollina
Copy link
Member

mcollina commented Sep 11, 2024

What are the next steps then?

Running the CI and getting it to pass. We'll take of running it, in case there are related failures it would up to you to fix them.

Do I have to do anything to initiate the merge? I see that my branch is behind a few commits, should I rebase and push again?

no need unless there are conflicts.

@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Sep 11, 2024
@nodejs-github-bot
Copy link
Collaborator

@anilhelvaci
Copy link
Author

Hey @mcollina , thanks for your reply!

So the CI is currently failing. When I look up the details, I see two kinds of failures;

  1. parallel.test-http-client-immidiate-error.js - The file I work on

Stack trace shows that my tests timeout for some reason.

---
duration_ms: 120011.463
exitcode: -15
severity: fail
stack: |-
  timeout
  (node:376527) internal/test/binding: These APIs are for internal testing only. Do not use them.
  (Use `node --trace-warnings ...` to show where the warning was created)
...
  1. parallel.test-runner-watch-mode-complex

For this file, I have no idea why this is failing. Here's the stack trace;

duration_ms: 6031.101
exitcode: 1
severity: fail
stack: "\u25B6 test runner watch mode with more complex setup\n  \u2716 should run\
  \ tests when a dependency changed after a watched test file being deleted (4747.503807ms)\n\
  \    AssertionError [ERR_ASSERTION]: The input did not match the regular expression\
  \ /tests 2/. Input:\n\n    '\u2714 second test has ran (3.238638ms)\\n' +\n    \
  \  '\u2714 first test has ran (4.048804ms)\\n' +\n      '\u2714 first test has ran\
  \ (12.434145ms)\\n' +\n      '\u2139 tests 3\\n' +\n      '\u2139 suites 0\\n' +\n\
  \      '\u2139 pass 3\\n' +\n      '\u2139 fail 0\\n' +\n      '\u2139 cancelled\
  \ 0\\n' +\n      '\u2139 skipped 0\\n' +\n      '\u2139 todo 0\\n' +\n      '\u2139\
  \ duration_ms 1312.515689\\n'\n\n        at TestContext.<anonymous> (file:///home/iojs/build/workspace/node-test-commit-linux-containered/test/parallel/test-runner-watch-mode-complex.mjs:99:12)\n\
  \        at process.processTicksAndRejections (node:internal/process/task_queues:105:5)\n\
  \        at async Test.run (node:internal/test_runner/test:888:9)\n        at async\
  \ Promise.all (index 0)\n        at async Suite.run (node:internal/test_runner/test:1268:7)\n\
  \        at async startSubtestAfterBootstrap (node:internal/test_runner/harness:283:3)\
  \ {\n      generatedMessage: true,\n      code: 'ERR_ASSERTION',\n      actual:\
  \ '\u2714 second test has ran (3.238638ms)\\n\u2714 first test has ran (4.048804ms)\\\
  n\u2714 first test has ran (12.434145ms)\\n\u2139 tests 3\\n\u2139 suites 0\\n\u2139\
  \ pass 3\\n\u2139 fail 0\\n\u2139 cancelled 0\\n\u2139 skipped 0\\n\u2139 todo 0\\\
  n...',\n      expected: /tests 2/,\n      operator: 'match'\n    }\n\n\u25B6 test\
  \ runner watch mode with more complex setup (4776.910571ms)\n\u2139 tests 1\n\u2139\
  \ suites 1\n\u2139 pass 0\n\u2139 fail 1\n\u2139 cancelled 0\n\u2139 skipped 0\n\
  \u2139 todo 0\n\u2139 duration_ms 4833.662639\n\n\u2716 failing tests:\n\ntest at\
  \ test/parallel/test-runner-watch-mode-complex.mjs:53:3\n\u2716 should run tests\
  \ when a dependency changed after a watched test file being deleted (4747.503807ms)\n\
  \  AssertionError [ERR_ASSERTION]: The input did not match the regular expression\
  \ /tests 2/. Input:\n\n  '\u2714 second test has ran (3.238638ms)\\n' +\n    '\u2714\
  \ first test has ran (4.048804ms)\\n' +\n    '\u2714 first test has ran (12.434145ms)\\\
  n' +\n    '\u2139 tests 3\\n' +\n    '\u2139 suites 0\\n' +\n    '\u2139 pass 3\\\
  n' +\n    '\u2139 fail 0\\n' +\n    '\u2139 cancelled 0\\n' +\n    '\u2139 skipped\
  \ 0\\n' +\n    '\u2139 todo 0\\n' +\n    '\u2139 duration_ms 1312.515689\\n'\n\n\
  \      at TestContext.<anonymous> (file:///home/iojs/build/workspace/node-test-commit-linux-containered/test/parallel/test-runner-watch-mode-complex.mjs:99:12)\n\
  \      at process.processTicksAndRejections (node:internal/process/task_queues:105:5)\n\
  \      at async Test.run (node:internal/test_runner/test:888:9)\n      at async\
  \ Promise.all (index 0)\n      at async Suite.run (node:internal/test_runner/test:1268:7)\n\
  \      at async startSubtestAfterBootstrap (node:internal/test_runner/harness:283:3)\
  \ {\n    generatedMessage: true,\n    code: 'ERR_ASSERTION',\n    actual: '\u2714\
  \ second test has ran (3.238638ms)\\n\u2714 first test has ran (4.048804ms)\\n\u2714\
  \ first test has ran (12.434145ms)\\n\u2139 tests 3\\n\u2139 suites 0\\n\u2139 pass\
  \ 3\\n\u2139 fail 0\\n\u2139 cancelled 0\\n\u2139 skipped 0\\n\u2139 todo 0\\n...',\n\
  \    expected: /tests 2/,\n    operator: 'match'\n  }"

There are other failing tests but their severity is flaky so I assume they are no problem for me.

I rebased my branch on top of the current main and make -j4 test passed. Could use any suggestions on how to debug/reproduce these failures happening in CI 🤔

@aduh95
Copy link
Contributor

aduh95 commented Sep 13, 2024

  1. parallel.test-http-client-immidiate-error.js - The file I work on

Stack trace shows that my tests timeout for some reason.

That's a red flag, we don't want to merge flaky tests in our codebase. A timeout probably means there's a race condition somewhere that you forgot to take into account and result in the test never exiting. To try to reproduce the flakiness locally, you can try running:

tools/test.py --repeat 9999 -t 2 test/parallel/test-http-client-immediate-error.js

Once you have a repro, you can attempt getting a fix ready.

@anilhelvaci
Copy link
Author

Hey @aduh95 , thanks for the clarifying 🙏

Unfortunately the piece of code you suggested did not reproduce the problem for me. I'm on OSX, can you think of anything else that I can try?

@ronag
Copy link
Member

ronag commented Sep 18, 2024

@mcollina FYI, this adds overhead to net to fix a bug in the "legacy" HTTP client. I don't mind per se, but it's not optimal since the bug is not actually in net.

@mcollina
Copy link
Member

CI is failing

@anilhelvaci
Copy link
Author

Yep, it is @mcollina. Any suggestions on how to reproduce it? Any chance I can shell into the machine(or container) facing problems? Or, are there any snapshot/images I can pull into my machine and try to spin it up?

In other words, what are the usual steps that you guys take when you face a problem in CI?

@mcollina
Copy link
Member

In this specific case, it seems that any Linux box on top of any virtualization software would do.

For more exotic systems @nodejs/build can provide access.

@anilhelvaci anilhelvaci force-pushed the fix-48771-net-exception-with-tests branch from 4725298 to fcc49db Compare June 17, 2025 09:47
@anilhelvaci
Copy link
Author

Could anyone approve the CI workflow please? I've rebased my work on top of latest main.

cc @ShogunPanda @anonrig

@ShogunPanda
Copy link
Contributor

@anilhelvaci Done :)

@anonrig anonrig added the request-ci Add this label to start a Jenkins CI on a PR. label Jun 17, 2025
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Jun 17, 2025
@nodejs-github-bot
Copy link
Collaborator

@mcollina
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-ci PRs that need a full CI run. net Issues and PRs related to the net subsystem.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Conditional unhandled 'error' event when http.request with .lookup