Skip to content

Conversation

l0kod
Copy link
Contributor

@l0kod l0kod commented Jul 7, 2025

Test access through disconnected directory.

This test should trigger a warning without this patch: https://lore.kernel.org/all/[email protected]/

An ongoing kernel patch series will be applied to change handling of disconnected directories:
https://lore.kernel.org/all/[email protected]/

@l0kod
Copy link
Contributor Author

l0kod commented Jul 7, 2025

This test triggers the WARN_ON_ONCE() when called with syz-execprog but not with syz-manager (which has been running for a while). Any idea why?

@a-nogikh
Copy link
Collaborator

a-nogikh commented Jul 7, 2025

Different parameters, probably? Do you set exactly the same -namespace, -procs, -enable (features)?

@l0kod
Copy link
Contributor Author

l0kod commented Jul 7, 2025

Different parameters, probably? Do you set exactly the same -namespace, -procs, -enable (features)?

I guess you mean sandbox and procs? And yes, these options are the same: default.

I removed syscall filtering and don't have any enable feature.

@a-nogikh
Copy link
Collaborator

a-nogikh commented Jul 7, 2025

Yes, I meant -sandbox and not -namespace, sorry for the confusion.

If these are similar to what syz-manager used, then it's weird indeed. Does the nature of the WARN_ON_ONCE() suggest what may be the case?

One more way to spot the difference(s) would be to run both tools with the -debug flag and then compare the debug output for the executed seed program. Do the calls prior to the kernel crash have the same errno values?

Also, did you use the runtest mode of syz-manager or did you use it as a fuzzer?

@l0kod
Copy link
Contributor Author

l0kod commented Jul 8, 2025

I didn't know about this run-tests mode, that helped, thanks.

I now see that syz-manager skips this test because of unsupported call renameat. I switched to renameat2 and it works!

However, this highlight an inconsistency between syz-manager and syz-execprog. Could that be a default syscall filter? Shouldn't syz-manager replace renameat with renameat2 instead of ignoring all tests with renameat?

@l0kod
Copy link
Contributor Author

l0kod commented Jul 8, 2025

This test triggers a WARN_ON_ONCE() in current kernel code, but I'm working on a fix that should be merged in a few weeks. In fact, I wrote this test after we found the issue, in the hope that this will improve fuzzing coverage.

When this test will be merged in syzkaller, will syzbot report this WARN_ON_ONCE() issue? If that's the case, I guess we should wait for the fix to be merged first to avoid creating artificial syzbot report, right?

@a-nogikh
Copy link
Collaborator

a-nogikh commented Jul 8, 2025

Happy to hear that you have found the problem!

Is renameat deprecated on all arches? In our descriptions, we don't have the syscall number only for riscv64

__NR_renameat = 386:302, amd64:264, arm:329, arm64:38, mips64le:5254, ppc64le:293, riscv64:???, s390x:295

Apparently, it still worked when syz-execprog used the syscall ID from there?


Overall, I think it's probably okay that syz-manager is more strict about what it finds in sys/*/test than syz-execprog is about what it takes as input. syz-execprog was built specifically to tolerate the input that might have been interleaved with console output (that used to be the case up until last year), so even if we start considering the actually enabled syscalls (which we probably should do, indeed), we would just skip those particular calls and go on.

func (ctx *Context) machineChecked(features flatrpc.Feature, syscalls map[*prog.Syscall]bool) queue.Source {

In the fuzzing mode, syz-manager cuts all calls that are not in the list of the enabled ones. So, without the renameat call, your test was just not executed the way you expected it to.

func FilterCandidates(candidates []fuzzer.Candidate, syscalls map[*prog.Syscall]bool,
dropMinimize bool) FilteredCandidates {
var ret FilteredCandidates
for _, item := range candidates {
if !item.Prog.OnlyContains(syscalls) {
ret.ModifiedHashes = append(ret.ModifiedHashes, hash.String(item.Prog.Serialize()))
// We cut out the disabled syscalls and retriage/minimize what remains from the prog.
// The original prog will be deleted from the corpus.
if dropMinimize {
item.Flags &= ^fuzzer.ProgMinimized
}
item.Prog.FilterInplace(syscalls)

@a-nogikh
Copy link
Collaborator

a-nogikh commented Jul 8, 2025

When this test will be merged in syzkaller, will syzbot report this WARN_ON_ONCE() issue? If that's the case, I guess we should wait for the fix to be merged first to avoid creating artificial syzbot report, right?

If syz-manager can now trigger the issue locally, it should definitely be able to trigger it on syzbot as well.

If you now also observe the crash during corpus triage each syzkaller restart, waiting a bit until the fix patch has reaches the kernel is probably a good idea.

@l0kod
Copy link
Contributor Author

l0kod commented Jul 8, 2025

Apparently, it still worked when syz-execprog used the syscall ID from there?

Yes 🤷

@l0kod
Copy link
Contributor Author

l0kod commented Jul 22, 2025

I updated the tests to improve coverage with the latest patch series, and it works as expected when I run it with syz-execprog.

However, running it with syz-manager in test mode failed with this error:

[...]
#0 [951ms] -> renameat2(0x3, 0x200000000540, 0x3, 0x200000000580, 0x0)
proc 4: got output: #0 [956ms] <- renameat2=0xffffffffffffffff errno=18 cover=9938
proc 4: got execute reply
handle completion: completed=19 output_size=6291456
proc 4: got output: umount(./0/file0/file2/file3)
umount(./0/file0/file2/file3)
proc 4: got output: umount(./0/file0/file2/file3) failed (errno 22)
proc 4: got output: loop exited with status 1
got data on response pipe in wrong state 2
proc 4: restarting subprocess, current state 2 attempts 0
proc 4: subprocess exit status 9
landlock_fs_disconnected none/cover C/repeat C/thr/cover: OK
landlock_fs_disconnected none/cover C/repeat C/thr/cover C: OK
landlock_fs_disconnected none/cover C/repeat C/thr/cover C/repeat C: FAIL: should repeat 3 times, but repeated 1, prog calls [0xc004065590 0xc0040655e0 0xc004065630 0xc004065680 0xc0040656d0 0xc004065720 0xc004065770 0xc0040657c0 0xc004065810 0xc004065860 0xc0040658b0 0xc004065900 0xc004065950 0xc0040659a0 0xc0040659f0 0xc004065a40 0xc004065a90 0xc004065ae0 0xc004065b30], info calls -1

The umount is not part of the tests and cannot succeed because the related directory was moved by the test (see first renameat2() call). I'm not sure how to fix this syzkaller issue.

syz-manager doesn't take this test as a seed and I guess this is the reason. Could you please confirm?

Test access through disconnected directory.

This test should trigger a warning without this patch:
https://lore.kernel.org/all/[email protected]/

An ongoing kernel patch series will be applied to change handling of
disconnected directories:
https://lore.kernel.org/all/[email protected]/

Signed-off-by: Mickaël Salaün <[email protected]>
@a-nogikh
Copy link
Collaborator

a-nogikh commented Jul 22, 2025

I get the same error when I run $ ./bin/syz-manager -config my.cfg -mode run-tests -tests landlock_fs_disconnected.

The umount is not part of the tests and cannot succeed because the related directory was moved by the test (see first renameat2() call). I'm not sure how to fix this syzkaller issue.

The umount is coming from this common code in syz-executor:

// One does not simply remove a directory.
// There can be mounts, so we need to try to umount.
// Moreover, a mount can be mounted several times, so we need to try to umount in a loop.
// Moreover, after umount a dir can become non-empty again, so we need another loop.
// Moreover, a mount can be re-mounted as read-only and then we will fail to make a dir empty.
static void remove_dir(const char* dir)

Specifically, from this line, I think

if (umount2(filename, umount_flags))
exitf("umount(%s) failed", filename);

Given that it's called right between executing the program and reporting the result, there's a chance that the umount failure is indeed breaking the test.

auto [err, output] = ExecuteBinaryImpl(msg, dir);
if (!err.empty()) {
char tmp[64];
snprintf(tmp, sizeof(tmp), " (errno %d: %s)", errno, strerror(errno));
err += tmp;
}
remove_dir(dir);
rpc::ExecResultRawT res;

Still, it's weird to see it return errno 22. Apparently, in this case, from unlink's EBUSY it didn't follow that something was mounted at the path.

@l0kod
Copy link
Contributor Author

l0kod commented Jul 22, 2025

Unsharing the mount namespace in the test avoids this error, but the test is still not used by syz-manager, or at least it doesn't show in the coverage. 🤔

@a-nogikh
Copy link
Collaborator

Unsharing the mount namespace in the test itself is unfortunately not the most sustainable approach here - remove_dir is also used during normal fuzzing (not after every program, though).

You have likely created such a situation with these mounts that our existing executor code is just unable to properly handle it :)

but the test is still not used by syz-manager, or at least it doesn't show in the coverage. 🤔

You could run syz-manager with -debug and check whether the program appeared in the logs and what were the results of its run.

Did you see new coverage when you ran the test with syz-execprog?

@l0kod
Copy link
Contributor Author

l0kod commented Jul 22, 2025

You could run syz-manager with -debug and check whether the program appeared in the logs and what were the results of its run.

syz-manager -debug doesn't list the used tests, just the output.

Did you see new coverage when you ran the test with syz-execprog?

Yes, it works.

I tested with a WARN_ON_ONCE() to be sure about the coverage, and syz-manager triggers this code, but it is not shown in the coverage, so it looks like an issue with the coverage display.

@a-nogikh
Copy link
Collaborator

syz-manager -debug doesn't list the used tests, just the output.

Yes, it won't list the tests directly, but it does list everything it has executed (alongside with debug info), and you test must have been somewhere in that big output.

I tested with a WARN_ON_ONCE() to be sure about the coverage, and syz-manager triggers this code, but it is not shown in the coverage, so it looks like an issue with the coverage display.

Could you please point to the kernel code that must have become covered, but didn't?
I'll try to reproduce it on my workstation.

@a-nogikh
Copy link
Collaborator

Regarding the umount2 failure discussed above.

So, your test

  • Mounts ./file0/file1 to ./file0/file6.
  • Does a self-bind mount of ./file0/file1/file2/file3 to itself.
  • Renames ./file0/file1/file2 to ./file0/file2.

We then fail both to delete ./file0/file2/file3 and to unmount it, apparently because there's no mount record for ./file0/file2/file3.

There's no mount record for ./file0/file1/file2/file3 either, though. /proc/self/mountinfo:

68 67 0:6 / /dev rw,relatime - devtmpfs devtmpfs rw,size=1076660k,nr_inodes=269165,mode=755
69 68 0:26 / /dev/pts rw,relatime - devpts devpts rw,gid=5,mode=620,ptmxmode=666
70 68 0:27 / /dev/shm rw,relatime - tmpfs tmpfs rw,mode=777
71 67 0:39 / /proc rw,relatime - proc syz-proc rw
72 67 0:25 / /sys rw,relatime - sysfs sysfs rw
73 72 0:8 / /sys/kernel/debug rw,relatime - debugfs debugfs rw
74 72 0:7 / /sys/kernel/security rw,relatime - securityfs securityfs rw
75 72 0:21 / /sys/kernel/config rw,relatime - configfs configfs rw
76 72 0:31 / /sys/fs/fuse/connections rw,relatime - fusectl fusectl rw
77 72 0:32 / /sys/fs/pstore rw,relatime - pstore none rw
78 72 0:33 / /sys/fs/bpf rw,relatime - bpf bpf rw
79 72 0:13 / /sys/kernel/tracing rw,relatime - tracefs tracefs rw
80 73 0:8 / /sys/kernel/debug rw,relatime - debugfs debugfs rw
81 71 0:30 / /proc/sys/fs/binfmt_misc rw,relatime - binfmt_misc binfmt_misc rw
82 67 0:34 / /syzcgroup/unified rw,relatime - cgroup2 none rw
83 67 0:36 / /syzcgroup/cpu rw,relatime - cgroup none rw,cpuacct,memory,hugetlb,clone_children
84 67 0:35 / /syzcgroup/net rw,relatime - cgroup none rw,blkio,devices,freezer,net_prio
47 68 0:40 / /dev/gadgetfs rw,relatime - gadgetfs gadgetfs rw
49 68 0:41 / /dev/binderfs rw,relatime - binder binder rw,max=1048576
50 67 0:42 / /0/file0 rw,relatime - tmpfs none rw
52 50 0:42 /file1 /0/file0/file6 rw,relatime - tmpfs none rw

l0kod added a commit to l0kod/linux that referenced this pull request Jul 24, 2025
@l0kod
Copy link
Contributor Author

l0kod commented Jul 24, 2025

Could you please point to the kernel code that must have become covered, but didn't?
I'll try to reproduce it on my workstation.

You'll find the kernel source I'm using, with the addition of the WARN_ON_ONCE(1) call here: https://github.com/l0kod/linux/commits/landlock-syzkaller-debug-disco/

@a-nogikh
Copy link
Collaborator

I have built Linux from https://github.com/l0kod/linux/commits/landlock-syzkaller-debug-disco/ (using this config) and syzkaller from https://github.com/l0kod/syzkaller/tree/disconnected at c355bf8 and I get tons of crashes at

WARNING: CPU: 0 PID: 8621 at security/landlock/fs.c:1016 is_access_to_paths_allowed+0x7c4/0x1820 security/landlock/fs.c:1016

Same for 19973b5.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants