Fix various problems in close(2) usage and pwdfd handling #882

JohnoKing · 2025-06-20T01:28:30Z

Changelog for this commit:

include/ast.h:

Added ast_close(), which endeavors to close fds while handling EINTR correctly, depending on platform specific behavior. The following three methods are implemented:
1. Guarantee a close with posix_close(3p), which is the best available method. This function is still very new because it's a recent addition to POSIX.
2. Add a version that only calls close once, then copes with EINTR by treating it as a non-error. Only Linux, FreeBSD and Cygwin use this right now. (Other OSes probably provide functionally equivalent leak-proof forms of close, but the documentation available for those is too vague on this matter. Perhaps I'll have to dive into the code for the open source OSes.)
3. Use the tried and true while loop to repeatedly invoke close, which is currently the generic fallback. On Linux, avoiding this method is effectively a bugfix, in large part because Linux can reuse a file descriptor after it's first closed before a second close call occurs, creating the potential for dangerous race conditions.¹ (This probably never occurs in ksh because it's single threaded, but it's better to avoid the while loop for correctness.)
Delete many <errno.h> include directives in favor of a single one included in ast.h (ast_close uses errno and is implemented as a macro in the ast.h header).

src/{subshell,xec}.c:

Close all of the previous subshell pwdfds after a fork to fix a file descriptor leak (first introduced in 93u+ 2012-07-27). Reproducer:

$ cat /tmp/reproducer
(cd /bin; (cd /usr; (cd /tmp; (cd /
    echo 'ls /proc/${.sh.pid}/fd; :' >/tmp/a
    echo '#/bin/sh' >/tmp/b
    cat /tmp/a >>/tmp/b
    chmod +x /tmp/{a,b}
    /tmp/a  # needs SHOPT_SPAWN=0 for older ksh
    /tmp/b  # needs SHOPT_SPAWN=0 for older ksh
    (ulimit -t unlimited; source /tmp/a)
))))
exit 0 # Avoid optimizing away subshells

$ ksh /tmp/reproducer
0  1  10  11  12  13  14  15  2
0  1  10  11  12  13  14  15  2
0  1  10  11  12  13  14  15  2

bltins/cd_pwd.c, sh/io.c:

sh_close(): Set errno to zero to avoid passing on an inherited errno. The close(2) call might fail and subsequently set errno, which makes the previous behavior bogus.
b_cd(): If the sh_close() calls in cd fail with anything other than EINTR, the errno from fchdir(2) is thrown away and replaced with the errno from close. This can cause two regression test failures in builtins.sh. Prevent this by saving the errno and restoring it at the relevant sh_close calls.² Error output:

test builtins(C.UTF-8) begins at 2025-06-19+18:20:50
	builtins.sh[996]: FAIL: can cd into a directory without x permission bit (absolute path arg) (expected status 1 and match of ': cd: /tmp/ksh93.shtests.253547.19870/builtins.C.UTF-8/no_x_dir: \[?Permission denied]?$', got status 1 and '/home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/tests/builtins.sh[994]: cd: /tmp/ksh93.shtests.253547.19870/builtins.C.UTF-8/no_x_dir: Success')
	builtins.sh[1001]: FAIL: can cd into a directory without x permission bit (relative path arg) (expected status 1 and match of ': cd: no_x_dir: \[?Permission denied]?$', got status 1 and '/home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/tests/builtins.sh[999]: cd: no_x_dir: No such file or directory')
test builtins(C.UTF-8) failed at 2025-06-19+18:20:54 with exit code 2 [ 294 tests 2 errors ]
Total errors: 2

sh_diropenat(): If openat(2) managed to obtain a file descriptor >10 on its first try and O_CLOEXEC is available, the fd wouldn't be registered with the shell because no sh_fcntl() call would be reached. Fix that by setting the file attributes in sh_diropenat() rather than sh_fcntl(). Consequently, we don't need to rely on sh_fcntl and thus we can use fcntl(2) directly.³

bltins/test.c:

test_stat(): If we need to stat e_dot (the current working directory) and sh.pwdfd is available, use fstat(2) because it will avoid unnecessarily reopening .. This change results in a small performance increase for cd(1) usage in subshells:

$ cat bench/v-cdloop.ksh
for ((i=0; i!=10000; i++)) do
   (cd /home; (cd /tmp; (cd /bin; (cd /usr; (cd /var)))))
done; :
$ shbench -b bench/v-cdloop.ksh -l 4 /tmp/ksh-{before,after}
------------------------------------------------------
name          /tmp/ksh-before      /tmp/ksh-after
------------------------------------------------------
v-cdloop.ksh  0.414 [0.412-0.416]  0.381 [0.341-0.398]
------------------------------------------------------

sh/fault.c:

sh_done(): Delete an unnecessary close call backported from ksh93v-. The operating system closes all file descriptors for us via exit(3), so a manual close call only serves to make cleanup slower.

comp/arc4random.c:

_ast_getentropy(): Fix a file descriptor leak when read(2) fails.

tm/tvtouch.c:

tvtouch(): Add an else branch to avoid closing the same file descriptor twice.

tests/subshell.sh:

Add a regression test capable of detecting the subshell pwdfd leak going back as far as ksh93u+ 2012-07-27, where the leak is incipient.

For more information, please see the relevant LWN article et seq: https://lwn.net/Articles/576478/ ↩
This bug was noticed after sh_close was changed to reset errno before calling close. ↩
cf. sh_open() in io.c ↩

JohnoKing · 2025-06-20T01:51:57Z

Well, this patch causes the CI to fail horrifically :\
Guess I'll be busy debugging it now...

JohnoKing · 2025-06-20T02:42:58Z

The test failures were caused by the new sh_clear_subshell_pwdfd() function closing stdin, which was because sp->pwdfd got initialized to 0 (which is wrong, it should be initialized to -1 when there's no file descriptor). That issue has been fixed and the CI now passes.

Changelog for this commit: include/ast.h: - Added ast_close, which endeavors to close fds while handling EINTR correctly, depending on platform specific behavior. The following three methods are implemented: 1) Guarantee a close with posix_close(3p), which is the best available method. This function is still *very* new because it's a recent addition to POSIX. 2) Add a version that only calls close once, then copes with EINTR by treating it as a non-error. Only Linux, FreeBSD and Cygwin use this right now. (Other OSes probably provide functionally equivalent leak-proof forms of close, but the documentation available for those is too vague on this matter. Perhaps I'll have to dive into the code for the open source ones.) 3) Use the tried and true while loop to repeatedly invoke close, which is currently the generic fallback. On Linux, avoiding this method is effectively a bugfix, in large part because Linux can reuse a file descriptor after it's first closed before a second close call occurs, creating the potential for dangerous race conditions. For more information, please see the relevant LWN article et seq: https://lwn.net/Articles/576478/ - Delete many <errno.h> include directives in favor of a single one included in ast.h (ast_close uses errno and is implemented as a macro in the ast.h header). src/subshell.c: - Close all of the previous subshell pwdfds after a fork to fix a file descriptor leak (first introduced in 93u+ 2012-07-27). Reproducer: $ cat /tmp/reproducer (cd /bin; (cd /usr; (cd /tmp; (cd / echo 'ls /proc/${.sh.pid}/fd; :' >/tmp/a echo '#/bin/sh' >/tmp/b cat /tmp/a >>/tmp/b chmod +x /tmp/{a,b} /tmp/a # SHOPT_SPAWN=0 /tmp/b $ SHOPT_SPAWN=0 (ulimit -t unlimited; source /tmp/a) )))) exit 0 # Avoid optimizing away subshells - Initialize sp->pwdfd to -1 in all cases (this avoids the potential for incorrectly closing stdin). - While we're at it, switch to using C99 style struct initialization. bltins/cd_pwd.c, sh/io.c: - sh_close(): Set errno to zero to avoid passing on an inherited errno. The close(2) call might fail and subsequently set errno, which makes the previous behavior bogus. - b_cd(): If the sh_close() calls in cd fail with anything other than EINTR, the errno from fchdir(2) is thrown away and replaced with close's errno. This can cause two regression test failures in builtins.sh. Prevent this by saving the errno and restoring it at the relevant sh_close calls. Error output (snipped to fit it into the commit log): builtins.sh[996]: FAIL: can cd into a directory without x permission bit (absolute path arg) (expected status 1 and match of ': cd: /tmp/ksh93.shtests 377539.24978/builtins.C/no_x_dir: \[?Permission denied]?$', got status 1 and '<SNIP>/builtins.sh[994]: cd: /tmp/ksh93.shtests.377539.24978/builtins. /no_x_dir: Success') builtins.sh[1001]: FAIL: can cd into a directory without x permission bit (relative path arg) (expected status 1 and match of ': cd: no_x_dir: \[?Permission denied]?$', got status 1 and '<SNIP>/tests/builtins.sh[999]: cd: no_x_dir: No such file or directory') This bug was noticed after sh_close was changed to reset errno before calling close. - sh_diropenat(): If openat() managed to obtain a file descriptor >10 on its first try and O_CLOEXEC is available, the fd wouldn't be registered with the shell because no sh_fcntl() call would be reached. Fix that by setting the file attributes in sh_diropenat() rather than sh_fcntl(). Consequently, we don't need to rely on sh_fcntl and thus we can use fcntl(2) directly. (cf. sh_open() in io.c) bltins/test.c: - test_stat(): If we need to stat e_dot (the current working directory) and sh.pwdfd is available, use fstat(2) because it will avoid unnecessarily reopening '.'. This change results in a small performance increase for cd(1) usage in subshells: $ cat bench/v-cdloop.ksh for ((i=0; i!=10000; i++)) do (cd /home; (cd /tmp; (cd /bin; (cd /usr; (cd /var))))) done; : $ shbench -b bench/v-cdloop.ksh -l 4 /tmp/ksh-{before,after} ------------------------------------------------------ name /tmp/ksh-before /tmp/ksh-after ------------------------------------------------------ v-cdloop.ksh 0.414 [0.412-0.416] 0.381 [0.341-0.398] ------------------------------------------------------ sh/fault.c: - sh_done(): Delete an unnecessary close call backported from ksh93v-. The operating system closes all fds for us via exit(3), so a manual close(2) call only serves to make cleanup slower. comp/arc4random.c: - _ast_getentropy(): Fix a file descriptor leak when read(2) fails. tm/tvtouch.c: - tvtouch(): Add an else branch to avoid closing the same file descriptor twice. tests/subshell.sh: - Add a regression test capable of detecting the subshell pwdfd leak going back as far as ksh93u+ 2012-07-27, where the leak is incipient.

JohnoKing force-pushed the fix-fd-closing branch from fe0c4c1 to 3cd6613 Compare July 10, 2025 23:58

test_stat(): Use strcmp to enable pwdfd optimization for test -ef

22033ea

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix various problems in close(2) usage and pwdfd handling #882

Fix various problems in close(2) usage and pwdfd handling #882

Uh oh!

JohnoKing commented Jun 20, 2025 •

edited

Loading

Uh oh!

JohnoKing commented Jun 20, 2025 •

edited

Loading

Uh oh!

JohnoKing commented Jun 20, 2025 •

edited

Loading

Uh oh!

Uh oh!

Fix various problems in close(2) usage and pwdfd handling #882

Are you sure you want to change the base?

Fix various problems in close(2) usage and pwdfd handling #882

Uh oh!

Conversation

JohnoKing commented Jun 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Footnotes

Uh oh!

JohnoKing commented Jun 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JohnoKing commented Jun 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

JohnoKing commented Jun 20, 2025 •

edited

Loading

JohnoKing commented Jun 20, 2025 •

edited

Loading

JohnoKing commented Jun 20, 2025 •

edited

Loading