2.3.4 staging prep #17595

amotin · 2025-08-05T17:43:54Z

Most of these commits we've already merged into TrueNAS, few others we'd like to, if there are no objections.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Quality assurance (non-breaking change which makes the code more robust against bugs)
Breaking change (fix or feature that would cause existing functionality to change)
Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
Documentation (a change to man pages or other documentation)

Checklist:

My code follows the OpenZFS code style requirements.
I have updated the documentation accordingly.
I have read the contributing document.
I have added tests to cover my changes.
I have run the ZFS Test Suite with this change applied.
All commit messages are properly formatted and contain Signed-off-by.

ZIL introduced dependencies between its write ZIOs to permit flush defer, when we flush vdev caches only once all the write ZIOs has completed. But it was recently spotted that it serializes not only ZIO completions handling, but also their ready stage. It means ZIO pipeline can't calculate checksums for the following ZIOs until all the previous are checksumed, even though it is not required. On a systems where memory throughput of a single CPU core is limited, it creates single-core CPU bottleneck, which is difficult to see due to ZIO pipeline design with many taskqueue threads. While it would be great to bypass the ready stage waits, it would require changes to ZIO code, and I haven't found a clean way to do it. But I've noticed that we don't need any dependency between the write ZIOs if the previous one has some waiters, which means it won't defer any flushes and work as a barrier for the earlier ones. Bypassing it won't help large single-thread writes, since all the write ZIOs except the last in that case won't have waiters, and so will be dependent. But in that case the ZIO processing might not be a bottleneck, since there will be only one thread populating the write buffers, that will likely be the bottleneck. But bypassing the ZIO dependency on multi-threaded write workloads really allows them to scale beyond the checksuming throughput of one CPU core. My tests with writing 12 files on a same dataset on a pool with 4 striped NVMes as SLOGs from 12 threads with 1MB blocks on a system with Xeon Silver 4114 CPU show total throughput increase from 4.3GB/s to 8.5GB/s, increasing the SLOGs busy from ~30% to ~70%. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Rob Norris <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#17458

This PR condenses the FDT dedup log syncing into a single sync pass. This reduces the overhead of modifying indirect blocks for the dedup table multiple times per txg. In addition, changes were made to the formula for how much to sync per txg. We now also consider the backlog we have to clear, to prevent it from growing too large, or remaining large on an idle system. Sponsored-by: Klara, Inc. Sponsored-by: iXsystems, Inc. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Authored-by: Don Brady <[email protected]> Authored-by: Paul Dagnelie <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Closes openzfs#17038

- Don't drop L2ARC header if we have more buffers in this header. Since we leave them the header, leave them the L2ARC header also. Honestly we are not required to drop it even if there are no other buffers, but then we'd need to allocate it a separate header, which we might drop soon if the old block is really deleted. Multiple buffers in a header likely mean active snapshots or dedup, so we know that the block in L2ARC will remain valid. It might be rare, but why not? - Remove some impossible assertions and conditions. Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#17126

The `scn_min_txg` can now be used not only with resilver. Instead of checking `scn_min_txg` to determine whether it’s a resilver or a scrub, simply check which function is defined. Thanks to this change, a scrub_finish event is generated when performing a scrub from the saved txg. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Mariusz Zaborski <[email protected]> Closes openzfs#17432

dbuf_verify(): Don't need the lock, since we only compare pointers. dbuf_findbp(): Don't need the lock, since aside of unneeded assert we only produce the pointer, but don't de-reference it. dnode_next_offset_level(): When working on top level indirection should lock dnode buffer's db_rwlock, since it is our parent. If dnode has no buffer, then it is meta-dnode or one of quotas and we should lock the dataset's ds_bp_rwlock instead. Reviewed-by: Alan Somers <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#17441

There are still a variety of bugs involving the vdev_nonrot property that will cause problems if you try to run the test suite with segment-based weighting disabled, and with other things in the weighting code. Parents' nonrot property need to be updated when children are added. When vdevs are expanded and more metaslabs are added, the weights have to be recalculated (since the number of metaslabs is an input to the lba bias function). When opening, faulted or unopenable children should not be considered for whether a vdev is nonrot or not (since the nonrot property is determined during a successful open, this can cause false negatives). And draid spares need to have the nonrot property set correctly. Sponsored-by: Eshtek, creators of HexOS Sponsored-by: Klara, Inc. Reviewed-by: Allan Jude <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Rob Norris <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Closes openzfs#17469

Use statx to verify that path-based unmounts proceed only if the mountpoint reported by statx matches the MNTTAB entry reported by libzfs, aborting the operation if they differ. Align `zfs umount /path` behavior with `zfs umount dataset`. Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Ameer Hamza <[email protected]> Closes openzfs#17481

Currently, after a failed allocation, the metaslab code recalculates the weight for a metaslab. However, for space-based metaslabs, it uses the maximum free segment size instead of the normal weighting algorithm. This is presumably because the normal metaslab weight is (roughly) intended to estimate the size of the largest free segment, but it doesn't do that reliably at most fragmentation levels. This means that recalculated metaslabs are forced to a weight that isn't really using the same units as the rest of them, resulting in undesirable behaviors. We switch this to use the normal space-weighting function. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Sponsored-by: Wasabi Technology, Inc. Sponsored-by: Klara, Inc. Closes openzfs#17531

Under parallel workloads ZIL may delay writes of open LWBs that are not full enough. On suspend we do not expect anything new to appear since zil_get_commit_list() will not let it pass, only returning TXG number to wait for. But I suspect that waiting for the TXG commit without having the last LWB issued may not wait for its completion, resulting in panic described in openzfs#17509. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Rob Norris <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#17521

On Linux, when doing path lookup with LOOKUP_RCU, dentry and inode can be dereferenced without refcounts and locks. For this reason, dentry and inode must only be freed after RCU grace period. However, zfs currently frees inode in zfs_inode_destroy synchronously and we can't use GPL-only call_rcu() in zfs directly. Fortunately, on Linux 5.2 and after, if we define sops->free_inode(), the kernel will do call_rcu() for us. This issue may be triggered more easily with init_on_free=1 boot parameter: BUG: kernel NULL pointer dereference, address: 0000000000000020 RIP: 0010:selinux_inode_permission+0x10e/0x1c0 Call Trace: ? show_trace_log_lvl+0x1be/0x2d9 ? show_trace_log_lvl+0x1be/0x2d9 ? show_trace_log_lvl+0x1be/0x2d9 ? security_inode_permission+0x37/0x60 ? __die_body.cold+0x8/0xd ? no_context+0x113/0x220 ? exc_page_fault+0x6d/0x130 ? asm_exc_page_fault+0x1e/0x30 ? selinux_inode_permission+0x10e/0x1c0 security_inode_permission+0x37/0x60 link_path_walk.part.0.constprop.0+0xb5/0x360 ? path_init+0x27d/0x3c0 path_lookupat+0x3e/0x1a0 filename_lookup+0xc0/0x1d0 ? __check_object_size.part.0+0x123/0x150 ? strncpy_from_user+0x4e/0x130 ? getname_flags.part.0+0x4b/0x1c0 vfs_statx+0x72/0x120 ? ioctl_has_perm.constprop.0.isra.0+0xbd/0x120 __do_sys_newlstat+0x39/0x70 ? __x64_sys_ioctl+0x8d/0xd0 do_syscall_64+0x30/0x40 entry_SYSCALL_64_after_hwframe+0x62/0xc7 Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Rob Norris <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Co-authored-by: Chunwei Chen <[email protected]> Closes openzfs#17546

During hotplug REMOVED events, devid matching fails for partition-based spares because devid information is not stored in pool config for partitioned devices. However, when devid is populated by the hotplug event, the original code skipped the search logic entirely, skipping vdev_guid matching and resulting in wrong device type detection that caused spares to be incorrectly identified as l2arc devices. Additionally, fix zfs_agent_iter_pool() to use the return value from zfs_agent_iter_vdev() instead of relying on search parameters, which was previously ignored. Also add pool_guid optimization to enable targeted pool searching when pool_guid is available. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Ameer Hamza <[email protected]> Closes openzfs#17545

Currently, when reading compressed blocks with -R and decompressing them with :d option and specifying lsize, which is normally bigger than psize for compressed blocks, the checksum is calculated on decompressed data. But it makes no sense since zfs always calculates checksum on physical, i.e. compressed data. So reading the same block produces different checksum results depending on how we read it, whether we decompress it or not, which, again, makes no sense. Fix: use psize instead of lsize when calculating the checksum so that it is always calculated on the physical block size, no matter was it compressed or not. Signed-off-by: Andriy Tkachuk <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Closes openzfs#17547

Update the default FICLONE and FICLONERANGE ioctl behavior to wait on dirty blocks. While this does remove some control from the application, in practice ZFS is better positioned to the optimial thing and immediately force a TXG sync. Reviewed-by: Rob Norris <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#17455

When we're passivating a metaslab group we start by passivating the metaslabs that have been activated for each of the allocators. To do that, we need to provide a weight. However, currently this erroneously always uses a segment-based weight, even if segment-based weighting is disabled. Use the normal weight function, which will decide which type of weight to use. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Closes openzfs#17566

While booting, only the needed 256KiB benchmarks are done now. The delay for checking all checksums occurs when requested via: - Linux: cat /proc/spl/kstat/zfs/chksum_bench - FreeBSD: sysctl kstat.zfs.misc.chksum_bench Reported by: Lahiru Gunathilake <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Tino Reichardt <[email protected]> Co-authored-by: Colin Percival <[email protected]> Closes openzfs#17563 Closes openzfs#17560

Sponsored-by: Klara, Inc. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Igor Ostapenko <[email protected]> Closes openzfs#17581

Signed-off-by: Ameer Hamza <[email protected]>

…17073) The redundant_metadata setting in ZFS allows users to trade resilience for performance and space savings. This applies to all data and metadata blocks in zfs, with one exception: gang blocks. Gang blocks currently just take the copies property of the IO being ganged and, if it's 1, sets it to 2. This means that we always make at least two copies of a gang header, which is good for resilience. However, if the users care more about performance than resilience, their gang blocks will be even more of a penalty than usual. We add logic to calculate the number of gang headers copies directly, and store it as a separate IO property. This is stored in the IO properties and not calculated when we decide to gang because by that point we may not have easy access to the relevant information about what kind of block is being stored. We also check the redundant_metadata property when doing so, and use that to decide whether to store an extra copy of the gang headers, compared to the underlying blocks. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Signed-off-by: Paul Dagnelie <[email protected]> Co-authored-by: Paul Dagnelie <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Tony Hutter <[email protected]>

Missed in openzfs#17073, probably because that PR was branched before openzfs#17001 was landed and never rebased. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Signed-off-by: Rob Norris <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Tony Hutter <[email protected]>

As discussed in the comments of PR openzfs#17004, you can theoretically run into a case where a gang child has more copies than the gang header, which can lead to some odd accounting behavior (and even trip a VERIFY). While the accounting code could be changed to handle this, it fundamentally doesn't seem to make a lot of sense to allow this to happen. If the data is supposed to have a certain level of reliability, that isn't actually achieved unless the gang_copies property is set to match it. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Closes openzfs#17484

Loss of one indirect block of the meta dnode likely means loss of the whole dataset. It is worse than one file that the man page promises, and in my opinion is not much better than "none" mode. This change restores redundancy of the meta-dnode indirect blocks, while same time still corrects expectations in the man page. Reviewed-by: Akash B <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Rob Norris <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#17339

shodanshok · 2025-08-06T19:31:22Z

Maybe #17542 also?

robn · 2025-08-07T03:39:36Z

I'm fine with also these going in. I don't think its the whole list of course, but we usually do a few rounds of these PRs. I'll start daily driving this set and will put together my own list of wants soon (but for sure #17542 is critical imo).

satmandu · 2025-08-07T15:43:37Z

Is #17561 too new? Getting as many of the PRS that handle memory pressure issues as are deemed safe into a point version update would be nice.

All this machinery is there to try to understand when there an async writeback waiting to complete because the intent log callbacks are still outstanding, and force them with a timely zil_commit(). The next commit fixes this properly, so there's no need for all this extra housekeeping. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes openzfs#17584

For async page writeback, we do not need to wait for the page to be on disk before returning to the caller; it's enough that the data from the dirty page be on the DMU and in the in-memory ZIL, just like any other write. So, if this is not a syncing write, don't add a callback to the itx, and instead just unlock the page immediately. (This is effectively the same concept used for FreeBSD in d323fbf). Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes openzfs#17584 Closes openzfs#14290

The structure of zfs_putpage() and its callers is tricky to follow. There's a lot more we could do to improve it, but at least now we have some description of one of the trickier bits. Writing this exposed a very subtle bug: most async pages pushed out through zpl_putpages() would go to the ZIL with commit=false, which can yield a less-efficient write policy. So this commit updates that too. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes openzfs#17584

Currently we fail the compilation via the #error directive if `HAVE_XSAVE` isn't defined. This breaks i586 builds since we check the toolchains SIMD support only on i686 and onward. Remove the requirement to fix the build on i586. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Attila Fülöp <[email protected]> Closes openzfs#13303 Closes openzfs#17590

Be standard-compliant by using `int main()`. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Attila Fülöp <[email protected]> Closes openzfs#13303 Closes openzfs#17590

The location of zgenhostid was changed in 0ae733c (Install zgenhostid to sbindir, 2021-01-21). We include all files within sbindir two lines earlier, which causes rpmbuild to report: File listed twice: /sbin/zgenhostid Drop the redundant entry from the %files section. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Todd Zullinger <[email protected]> Closes openzfs#17601

03987f7 (openzfs#16069) added a workaround to get the blk-mq hardware context for older kernels that don't cache it in the struct request. However, this workaround appears to be incomplete. In 4.19, the rq data context is optional. If its not initialised, then the cached rq->cpu will be -1, and so using it to index into mq_map causes a crash. Given that the upstream 4.19 is now in extended LTS and rarely seen, RHEL8 4.18+ has long carried "modern" blk-mq support, and the cached hardware context has been available since 5.1, I'm not going to huge lengths to get queue selection correct for the very few people that are likely to feel it. To that end, we simply call raw_smp_processor_id() to get a valid CPU id and use that instead. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Paul Dagnelie <[email protected]> Signed-off-by: Rob Norris <[email protected]> Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Closes openzfs#17597

This converts the body of a ZED slack notification from plain text to code block style to help with readability. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: René Wirnata <[email protected]> Closes openzfs#17610

Update the META file to reflect compatibility with the 6.16 kernel. Tested with 6.16.0-0-stable of Alpine Linux edge, see <https://gitlab.alpinelinux.org/alpine/aports/-/merge_requests/87929>. Reviewed-by: Rob Norris <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Achill Gilgenast <[email protected]> Closes openzfs#17578

Chase URL change from the FreeBSD project. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Colin Percival <[email protected]> Closes openzfs#17617

Allow zstd_mempool_init() to allocate using vmem_alloc() instead of kmem_alloc() to silence the large allocation warning on Linux during module load when the system has a large number of CPUs. It's not at all clear to me that scaling the allocation size with the number of CPUs is beneficial and that should be evaluated. But for the moment this should resolve the warning without introducing any unexpected side effects. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Rob Norris <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#17620 Closes openzfs#11557

Systems with a large number of CPU cores (192+) may trigger the large allocation warning in multilist_create() on Linux. Silence the warning by converting the allocation to vmem_alloc(). On Linux this results in a call to kvalloc() which will alloc vmem for large allocations and kmem for small allocations. On FreeBSD both vmem_alloc and kmem_alloc internally use the same allocator so there is no functional change. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Rob Norris <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#17616

satmandu · 2025-08-13T00:50:37Z

For what it is worth, I use dkms to build the zfs modules on ubuntu 25.04. Trying to build from this patchset I couldn't get the modules to build with dkms on kernel 6.16 unless I reverted the objtool patches #17541 and #17456 . (I did not try earlier kernels.)

With those reverted, and with the addition of #17621, I am able to build for both 6.16.0 and 6.17-rc1.

behlendorf · 2025-08-13T01:03:09Z

@amotin looks good. Regarding "zfs rewrite", normally I'd avoid adding a new sub-command but in this case since it doesn't modify any of the library ABIs and the code is so well isolated I'm okay with pulling it in. It's definitely a nice thing to make available before 2.4 is released.

Can you also add to this PR these commits from my zfs-2.3.4-staging-part2 branch which is rebased on top of your branch. Here's a break down of the additional commits. The #17584's changes are still pretty fresh, but as long as you and @robn don't have any concerns I think they're ready to pull in.

CI and test suite fixes:

46de04d FreeBSD 15.0 is now "PRERELEASE"
df5e02d CI: match and trim out internal timestamp for test prefix
245adb6 ZTS: include microsecond timestamps on all output
82a0868 CI: Remove Debian backports
8c4f625 CI: Add CentOS Stream 9/10 to the FULL_OS runner list
7882e85 Delete unused .cirrus.yml
6b38d0f ZTS: Fix FreeBSD 15.0 ksh errors
80b6457 CI: Switch from FreeBSD 13.4 to 13.5
2518f4b Revert "Fix incorrect expected error in ztest"
90d2c44 ztest: Fix false positive of ENOSPC handling
f7698f4 CI: run ztest on compressed zpool
024e60b Missing tests in make pkg
094305c Fix TestGroup warning due to missing tags
a826f7a ZTS: Use FreeBSD cloudinit images

Build and packaging fixes:

41ca229 Linux 6.16 compat: META
a49c957 linux/zvol_os: fix crash with blk-mq on Linux 4.19
d1d7063 rpm: don't list /sbin/zgenhostid twice in %files
11f8441 config: Avoid void main() in toolchain-simd.m4
57b614e SIMD: Don't require definition of HAVE_XSAVE
e7e0bb3 linux: Fix out-of-src builds
6c1130a pyzfs: Adapt python lib directory evaluation from ax_python_devel.m4
74b539d pyzfs: Update ax_python_devel.m4 to serial 37

Minor bug fixes:

0fe1036 Allow vmem_alloc backed multilists
3e78905 Silence zstd large allocation warning
6af1f61 Fix zdb pool/ with -k
5289f6f spa: ZIO_TASKQ_ISSUE: Use symbolic priority
9651668 zed: prettify slack notification message

PR #17584 performance fix for Linux async page writeback.

0c7d6e2 Linux: zfs_putpage: document (and fix!) confusing sync/commit modes
b9c45fe Linux: zfs_putpage: complete async page writeback immediately
f72226a Linux: sync: remove async/sync accounting
97fe868 ZTS: mmap_ftruncate test to confirm async writeback behaviour

robn · 2025-08-13T01:08:59Z

I'm fine with #17584. I've been running it locally for about a week.

Also should include #17533, same areas at #17584, but we already shipped the "bad" version of that in 2.3.3, this one is better.

I'll start running this new branch later today, and I still need to check if there's anything else I want to ship.

…completes" This causes async putpages to leave the pages sbusied for a long time, which hurts concurrency. Revert for now until we have a better approach. This reverts commit 238eab7. Reported by: Ihor Antonov <[email protected]> Discussed with: Rob Norris <[email protected]> References: freebsd/freebsd-src@738a9a7 Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Mark Johnston <[email protected]> Ported-by: Rob Norris <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes openzfs#17533

In syncing mode, zfs_putpages() would put the entire range of pages onto the ZIL, then return VM_PAGER_OK for each page to the kernel. However, an associated zil_commit() or txg sync had not happened at this point, so the write may not actually be on disk. So, we rework that case to use a ZIL commit callback, and do the post-write work of undirtying the page and signaling completion there. We return VM_PAGER_PEND to the kernel instead so it knows that we will take care of it. The original version of this (238eab7) copied the Linux model and did the cleanup in a ZIL callback for both async and sync. This was a mistake, as FreeBSD does not have a separate "busy for writeback" flag like Linux which keeps the page usable. The full sbusy flag locks the entire page out until the itx callback fires, which for async is after txg sync, which could be literal seconds in the future. For the async case, the data is already on the DMU and the in-memory ZIL, which is sufficient for async writeback, so the old method of logging it without a callback, undirtying the page and returning is more than sufficient and reclaims that lost performance. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Mark Johnston <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes openzfs#17533

amotin · 2025-08-13T02:44:22Z

Can you also add to this PR these commits from my zfs-2.3.4-staging-part2 branch which is rebased on top of your branch.

Added.

Also should include #17533

Included.

AttilaFueloep · 2025-08-13T10:09:33Z

@satmandu Can you open an issue and share the make.log?

mabod · 2025-08-13T12:38:21Z

When I try to compile zfs-dkms 2.3.3 with this patch (on endeavouros) the patch does not apply cleanly.

I am retrieving the patch in the PKGBUILD with this source:
https://patch-diff.githubusercontent.com/raw/openzfs/zfs/pull/17595.patch

Output of patch -p1 -i ../17595.patch:

patching file module/zfs/zil.c
patching file include/sys/ddt.h
patching file include/sys/vdev.h
...
patching file config/zfs-build.m4
patching file configure.ac
patching file module/Makefile.in
can't find file to patch at input line 5705
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--------------------------
|diff --git a/scripts/.gitignore b/scripts/.gitignore
|index 5621a6e147a0..443cb7b8484e 100644
|--- a/scripts/.gitignore
|+++ b/scripts/.gitignore
--------------------------
File to patch:

If I just hit return more missing file errors appear.

AttilaFueloep · 2025-08-13T15:47:14Z

Just deleted my last two comments, was on a wrong branch. Sorry for the confusion.

behlendorf · 2025-08-13T16:29:12Z

@mabod some files which are present in the git repository aren't included in the make dist tarball. For example, the .gitignore and CI files under .github. You can safely manually remove those hunks from the patch file if you're patching a release tarball. Otherwise you'll need to apply these changes to a clone of the repository.

AttilaFueloep · 2025-08-13T16:40:55Z

Adding to what Brian wrote, the patch you've downloaded is against the 2.3.4 staging branch and won't apply to 2.3.3. You'll need to change the sources entry in the PKGBUILD to git clone the 2.3.4 staging branch and apply the patch on top of that. See PKGBUILD(5).

tonyhutter · 2025-08-15T00:30:54Z

Merged as:
3b64a96 FreeBSD: zfs_putpages: don't undirty pages until after write completes
a072611 Revert "FreeBSD: zfs_putpages: don't undirty pages until after write completes"
0fe1036 Allow vmem_alloc backed multilists
3e78905 Silence zstd large allocation warning
46de04d FreeBSD 15.0 is now "PRERELEASE"
41ca229 Linux 6.16 compat: META
9651668 zed: prettify slack notification message
a49c957 linux/zvol_os: fix crash with blk-mq on Linux 4.19
d1d7063 rpm: don't list /sbin/zgenhostid twice in %files
11f8441 config: Avoid void main() in toolchain-simd.m4
57b614e SIMD: Don't require definition of HAVE_XSAVE
0c7d6e2 Linux: zfs_putpage: document (and fix!) confusing sync/commit modes
b9c45fe Linux: zfs_putpage: complete async page writeback immediately
f72226a Linux: sync: remove async/sync accounting
97fe868 ZTS: mmap_ftruncate test to confirm async writeback behaviour
df5e02d CI: match and trim out internal timestamp for test prefix
245adb6 ZTS: include microsecond timestamps on all output
82a0868 CI: Remove Debian backports
e7e0bb3 linux: Fix out-of-src builds
6af1f61 Fix zdb pool/ with -k
8c4f625 CI: Add CentOS Stream 9/10 to the FULL_OS runner list
7882e85 Delete unused .cirrus.yml
6b38d0f ZTS: Fix FreeBSD 15.0 ksh errors
80b6457 CI: Switch from FreeBSD 13.4 to 13.5
2518f4b Revert "Fix incorrect expected error in ztest"
90d2c44 ztest: Fix false positive of ENOSPC handling
f7698f4 CI: run ztest on compressed zpool
6c1130a pyzfs: Adapt python lib directory evaluation from ax_python_devel.m4
74b539d pyzfs: Update ax_python_devel.m4 to serial 37
024e60b Missing tests in make pkg
5289f6f spa: ZIO_TASKQ_ISSUE: Use symbolic priority
094305c Fix TestGroup warning due to missing tags
a826f7a ZTS: Use FreeBSD cloudinit images
86bf73c objtool wrapper: use absolute path to call the wrapper
1d293b3 Linux build: handle CONFIG_OBJTOOL_WERROR=y
22eb2bd Make TX abort after assign safer
809b553 Introduce zfs rewrite subcommand (#17246)
abb6211 Linux 6.16: remove writepage and readahead_page
c405a7a Skip dbuf_evict_one() from dbuf_evict_notify() for reclaim thread
4808641 enforce arc_dnode_limit
30fa92b Increase meta-dnode redundancy in "some" mode
fd5a27c Ensure that gang_copies is always at least as large as copies
3ad3f43 zts: add spdx license tags to gang_blocks tests (#17160)
a46ce73 Make ganging redundancy respect redundant_metadata property (#17073)
9079095 SPDX: Add missing CDDL-1.0 license
95abbc7 range_tree: Provide more debug details upon unexpected add/remove
fc658b9 Faster checksum benchmark on system boot
271b979 Don't use wrong weight when passivating group
582e784 Default to zfs_bclone_wait_dirty=1
6d37856 zdb: fix checksum calculation for decompressed blocks
0c928f7 ZED: Fix device type detection and pool iteration logic
c79d5e4 Define sops->free_inode() to prevent use-after-free during lookup
347d680 ZIL: Force writing of open LWB on suspend
acf3871 Correct weight recalculation of space-based metaslabs
21d5f25 Validate mountpoint on path-based unmount using statx
7e945a5 Fix other nonrot bugs
85ce6b8 Polish db_rwlock scope
954894e scrub: generate scrub_finish event
a4e775d Some arc_release() cleanup
661310f FDT dedup log sync -- remove incremental
f9d59b5 ZIL: Relax parallel write ZIOs processing

amotin and others added 21 commits August 5, 2025 12:14

range_tree: Provide more debug details upon unexpected add/remove

95abbc7

Sponsored-by: Klara, Inc. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Igor Ostapenko <[email protected]> Closes openzfs#17581

SPDX: Add missing CDDL-1.0 license

9079095

Signed-off-by: Ameer Hamza <[email protected]>

amotin requested review from pcd1193182, behlendorf and ixhamza August 5, 2025 17:44

amotin mentioned this pull request Aug 5, 2025

Physical rewrite #17565

Closed

14 tasks

ixhamza approved these changes Aug 5, 2025

View reviewed changes

robn approved these changes Aug 7, 2025

View reviewed changes

robn and others added 12 commits August 12, 2025 17:23

config: Avoid void main() in toolchain-simd.m4

11f8441

Be standard-compliant by using `int main()`. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Attila Fülöp <[email protected]> Closes openzfs#13303 Closes openzfs#17590

FreeBSD 15.0 is now "PRERELEASE"

46de04d

Chase URL change from the FreeBSD project. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Colin Percival <[email protected]> Closes openzfs#17617

markjdb and others added 2 commits August 12, 2025 22:41

behlendorf approved these changes Aug 13, 2025

View reviewed changes

satmandu mentioned this pull request Aug 13, 2025

dkms build failure with 2.3.4 staging #17633

Open

kerberizer mentioned this pull request Aug 14, 2025

upstream: ZFS v2.3.4 archzfs/archzfs#600

Draft

tonyhutter approved these changes Aug 14, 2025

View reviewed changes

tonyhutter merged commit 3b64a96 into openzfs:zfs-2.3.4-staging Aug 15, 2025
37 of 41 checks passed

amotin deleted the zfs-2.3.4-staging branch August 15, 2025 02:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

2.3.4 staging prep #17595

2.3.4 staging prep #17595

amotin commented Aug 5, 2025

Uh oh!

shodanshok commented Aug 6, 2025

Uh oh!

robn commented Aug 7, 2025

Uh oh!

satmandu commented Aug 7, 2025

Uh oh!

satmandu commented Aug 13, 2025

Uh oh!

behlendorf commented Aug 13, 2025

Uh oh!

robn commented Aug 13, 2025

Uh oh!

amotin commented Aug 13, 2025

Uh oh!

AttilaFueloep commented Aug 13, 2025

Uh oh!

mabod commented Aug 13, 2025

Uh oh!

AttilaFueloep commented Aug 13, 2025

Uh oh!

behlendorf commented Aug 13, 2025

Uh oh!

AttilaFueloep commented Aug 13, 2025

Uh oh!

Uh oh!

tonyhutter commented Aug 15, 2025

Uh oh!

Uh oh!

2.3.4 staging prep #17595

2.3.4 staging prep #17595

Conversation

amotin commented Aug 5, 2025

Types of changes

Checklist:

Uh oh!

shodanshok commented Aug 6, 2025

Uh oh!

robn commented Aug 7, 2025

Uh oh!

satmandu commented Aug 7, 2025

Uh oh!

satmandu commented Aug 13, 2025

Uh oh!

behlendorf commented Aug 13, 2025

Uh oh!

robn commented Aug 13, 2025

Uh oh!

amotin commented Aug 13, 2025

Uh oh!

AttilaFueloep commented Aug 13, 2025

Uh oh!

mabod commented Aug 13, 2025

Uh oh!

AttilaFueloep commented Aug 13, 2025

Uh oh!

behlendorf commented Aug 13, 2025

Uh oh!

AttilaFueloep commented Aug 13, 2025

Uh oh!

Uh oh!

tonyhutter commented Aug 15, 2025

Uh oh!

Uh oh!