forked from torvalds/linux
-
Notifications
You must be signed in to change notification settings - Fork 12
Regular for-next build test #1157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
kdave
wants to merge
10,000
commits into
build
Choose a base branch
from
for-next
base: build
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2d4aefb
to
c9e380a
Compare
c56343b
to
1cab137
Compare
6613f3c
to
b30a0ce
Compare
d205ebd
to
c0bd9d9
Compare
15022b1
to
c22750c
Compare
28d9855
to
e18d8ce
Compare
It's just a simple wrapper around btrfs_clear_extent_bit() that passes a NULL for its last argument (a cached extent state record), plus there is not counter part - we have a btrfs_set_extent_bit() but we do not have a btrfs_set_extent_bits() (plural version). So just remove it and make all callers use btrfs_clear_extent_bit() directly. Reviewed-by: Qu Wenruo <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
There are some reports of "unable to find chunk map for logical 2147483648 length 16384" error message appears in dmesg. This means some IOs are occurring after a block group is removed. When a metadata tree node is cleaned on a zoned setup, we keep that node still dirty and write it out not to create a write hole. However, this can make a block group's used bytes == 0 while there is a dirty region left. Such an unused block group is moved into the unused_bg list and processed for removal. When the removal succeeds, the block group is removed from the transaction->dirty_bgs list, so the unused dirty nodes in the block group are not sent at the transaction commit time. It will be written at some later time e.g, sync or umount, and causes "unable to find chunk map" errors. This can happen relatively easy on SMR whose zone size is 256MB. However, calling do_zone_finish() on such block group returns -EAGAIN and keep that block group intact, which is why the issue is hidden until now. Fixes: afba2bc ("btrfs: zoned: implement active zone tracking") CC: [email protected] # 6.1+ Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Naohiro Aota <[email protected]> Signed-off-by: David Sterba <[email protected]>
btrfs_zone_finish() can fail for several reason. If it is -EAGAIN, we need to try it again later. So, put the block group to the retry list properly. Failing to do so will keep the removable block group intact until remount and can causes unnecessary ENOSPC. Fixes: 74e91b1 ("btrfs: zoned: zone finish unused block group") CC: [email protected] # 6.1+ Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Naohiro Aota <[email protected]> Signed-off-by: David Sterba <[email protected]>
If the ssd_spread mount option is enabled, then we run the so called clustered allocator for data block groups. In practice, this results in creating a btrfs_free_cluster which caches a block_group and borrows its free extents for allocation. Since the introduction of allocation size classes, there has been a bug in the interaction between that feature and ssd_spread. find_free_extent() has a number of nested loops. The loop going over the allocation stages, stored in ffe_ctl->loop and managed by find_free_extent_update_loop(), the loop over the raid levels, and the loop over all the block_groups in a space_info. The size class feature relies on the block_group loop to ensure it gets a chance to see a block_group of a given size class. However, the clustered allocator uses the cached cluster block_group and breaks that loop. Each call to do_allocation() will really just go back to the same cached block_group. Normally, this is OK, as the allocation either succeeds and we don't want to loop any more or it fails, and we clear the cluster and return its space to the block_group. But with size classes, the allocation can succeed, then later fail, outside of do_allocation() due to size class mismatch. That latter failure is not properly handled due to the highly complex multi loop logic. The result is a painful loop where we continue to allocate the same num_bytes from the cluster in a tight loop until it fails and releases the cluster and lets us try a new block_group. But by then, we have skipped great swaths of the available block_groups and are likely to fail to allocate, looping the outer loop. In pathological cases like the reproducer below, the cached block_group is often the very last one, in which case we don't perform this tight bg loop but instead rip through the ffe stages to LOOP_CHUNK_ALLOC and allocate a chunk, which is now the last one, and we enter the tight inner loop until an allocation failure. Then allocation succeeds on the final block_group and if the next allocation is a size mismatch, the exact same thing happens again. Triggering this is as easy as mounting with -o ssd_spread and then running: mount -o ssd_spread $dev $mnt dd if=/dev/zero of=$mnt/big bs=16M count=1 &>/dev/null dd if=/dev/zero of=$mnt/med bs=4M count=1 &>/dev/null sync if you do the two writes + sync in a loop, you can force btrfs to spin an excessive amount on semi-successful clustered allocations, before ultimately failing and advancing to the stage where we force a chunk allocation. This results in 2G of data allocated per iteration, despite only using ~20M of data. By using a small size classed extent, the inner loop takes longer and we can spin for longer. The simplest, shortest term fix to unbreak this is to make the clustered allocator size_class aware in the dumbest way, where it fails on size class mismatch. This may hinder the operation of the clustered allocator, but better hindered than completely broken and terribly overallocating. Further re-design improvements are also in the works. Fixes: 52bb7a2 ("btrfs: introduce size class to block group allocator") Reported-by: David Sterba <[email protected]> Reviewed-by: Filipe Manana <[email protected]> Signed-off-by: Boris Burkov <[email protected]> Signed-off-by: David Sterba <[email protected]>
Currently the defrag ioctl cannot rewrite the extents without compression. Add a new flag for that, as setting compression to 0 (or "no compression") means to do no changes to compression so take what is the current default, like mount options or properties. The defrag setting overrides mount or properties. The compression BTRFS_DEFRAG_DONT_COMPRESS is only used for in-memory operations and does not need to have a fixed value. Mount with zstd:9, copy test file from /usr/bin/ (about 260KB): $ mount -o compress=zstd:9 /dev/vda /mnt $ filefrag -vsb testfile filefrag: -b needs a blocksize option, assuming 1024-byte blocks. Filesystem type is: 9123683e File size of testfile is 297704 (292 blocks of 1024 bytes) ext: logical_offset: physical_offset: length: expected: flags: 0: 0.. 127: 13312.. 13439: 128: encoded 1: 128.. 255: 13364.. 13491: 128: 13440: encoded 2: 256.. 291: 13424.. 13459: 36: 13492: last,encoded,eof testfile: 3 extents found $ compsize testfile Processed 1 file, 3 regular extents (3 refs), 0 inline, 1 fragments. Type Perc Disk Usage Uncompressed Referenced TOTAL 42% 124K 292K 292K zstd 42% 124K 292K 292K Defrag to uncompressed: $ btrfs fi defrag --nocomp testfile $ filefrag -vsb testfile filefrag: -b needs a blocksize option, assuming 1024-byte blocks. Filesystem type is: 9123683e File size of testfile is 297704 (292 blocks of 1024 bytes) ext: logical_offset: physical_offset: length: expected: flags: 0: 0.. 291: 291840.. 292131: 292: last,eof testfile: 1 extent found $ compsize testfile Processed 1 file, 1 regular extents (1 refs), 0 inline, 1 fragments. Type Perc Disk Usage Uncompressed Referenced TOTAL 100% 292K 292K 292K none 100% 292K 292K 292K Compress again with LZO: $ btrfs fi defrag -clzo testfile $ filefrag -vsb testfile filefrag: -b needs a blocksize option, assuming 1024-byte blocks. Filesystem type is: 9123683e File size of testfile is 297704 (292 blocks of 1024 bytes) ext: logical_offset: physical_offset: length: expected: flags: 0: 0.. 127: 13312.. 13439: 128: encoded 1: 128.. 255: 13392.. 13519: 128: 13440: encoded 2: 256.. 291: 13480.. 13515: 36: 13520: last,encoded,eof testfile: 3 extents found $ compsize testfile Processed 1 file, 3 regular extents (3 refs), 0 inline, 1 fragments. Type Perc Disk Usage Uncompressed Referenced TOTAL 64% 188K 292K 292K lzo 64% 188K 292K 292K Signed-off-by: David Sterba <[email protected]>
Commit 9d9ea1e ("btrfs: subpage: fix relocation potentially overwriting last page data") fixed a bug when relocating data block groups for subpage cases. However for the incoming large folios for data reloc inode, we can hit the same situation where block size is the same as page size, but the folio we got is still larger than a block. In that case, the old subpage specific check is no longer reliable. Here we have to enhance the handling by: - Unconditionally invalidate the page cache for the current cluster We set the @flush to true so that any dirty folios are properly written back first. And this time instead of dropping the whole page cache, just drop the range covered by the current cluster. This will bring some minor performance drop, as for a large folio, the heading half will be read twice (read by previous cluster, then invalidated, then read again by the current cluster). However that is required to support large folios, and this gets rid of the kinda tricky manual uptodate flag clearing for each block. - Remove the special handling of writing back the whole page cache filemap_invalidate_inode() handles the write back already, and since we're invalidating all pages in the range, we no longer need to manually clear the uptodate flags for involved blocks. Thus there is no need to manually write back the whole page cache. Signed-off-by: Qu Wenruo <[email protected]> Signed-off-by: David Sterba <[email protected]>
The function btrfs_subpage_assert() is a very commonly utilized assert to make sure the range passed in is correct inside the folio. And when some code is not properly subpage/large folio compatible btrfs_subpage_assert() will be the first to be triggered. E.g. when I incorrectly enabled large folios for data reloc inodes, it immediately triggered btrfs_subpage_assert(). In that case, outputting all the involved members will be very helpful, this includes: - start - len - folio position inside the mapping - folio size Signed-off-by: Qu Wenruo <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
For data reloc inodes, they are a special type of inodes that are not exposed to user space, and are only utilized during data block groups relocation. They do not go under regular read-write operations, but have their file extents manually created to have the same layout of a block group, then its content is read from the original block group, and written back to the new location which is in a new block group. Previously all the handling was done in page units, and commit c283289 ("btrfs: make relocate_one_page() handle subpage case") changed the handling to subpage blocks. On the other hand, data reloc inodes are a perfect match for large data folios, as each relocation cluster represents one or more data extents that are contiguous in their logical addresses. This patch enables large folios for data reloc inodes by: - Remove the special handling of data reloc inodes when setting folio order - Change relocate_one_folio() to return the file offset of the next folio Originally it's designed to handle fixed page sized blocks, but with large folios, we can handle a large folio, thus we have to return the end of the current folio. - Remove the warning on folio_order() - Use folio_size() to replace fixed PAGE_SIZE usage - Use file_offset as iterator inside relocate_file_extent_cluster Signed-off-by: Qu Wenruo <[email protected]> Signed-off-by: David Sterba <[email protected]>
During log replay, at add_inode_ref(), we return -ENOENT if our current inode isn't found on the subvolume tree or if a parent directory isn't found. The error comes from btrfs_iget_logging() <- btrfs_iget() <- btrfs_read_locked_inode(). The single caller of add_inode_ref(), replay_one_buffer(), ignores an -ENOENT error because it expects that error to mean only that a parent directory wasn't found and that is ok. Before commit 5f61b96 ("btrfs: fix inode lookup error handling during log replay") we were converting any error when getting a parent directory to -ENOENT and any error when getting the current inode to -EIO, so our caller would fail log replay in case we can't find the current inode. After that commit however in case the current inode is not found we return -ENOENT to the caller and therefore it ignores the critical fact that the current inode was not found in the subvolume tree. Fix this by converting -ENOENT to 0 when we don't find a parent directory, returning -ENOENT when we don't find the current inode and making the caller, replay_one_buffer(), not ignore -ENOENT anymore. Fixes: 5f61b96 ("btrfs: fix inode lookup error handling during log replay") Reviewed-by: Boris Burkov <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Signed-off-by: David Sterba <[email protected]>
During log replay, at add_inode_ref(), if we have an extref item that contains multiple extrefs and one of them points to a directory that does not exist in the subvolume tree, we are supposed to ignore it and process the remaining extrefs encoded in the extref item, since each extref can point to a different parent inode. However when that happens we just return from the function and ignore the remaining extrefs. The problem has been around since extrefs were introduced, in commit f186373 ("btrfs: extended inode refs"), but it's hard to hit in practice because getting extref items encoding multiple extref requires getting a hash collision when computing the offset of the extref's key. The offset if computed like this: key.offset = btrfs_extref_hash(dir_ino, name->name, name->len); and btrfs_extref_hash() is just a wrapper around crc32c(). Fix this by moving to next iteration of the loop when we don't find the parent directory that an extref points to. Fixes: f186373 ("btrfs: extended inode refs") Reviewed-by: Boris Burkov <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Signed-off-by: David Sterba <[email protected]>
…ode_ref() We are using a variable named 'log_ref_ver' of type int to indicate if we are processing an extref item or not, using a value of 1 if so, otherwise 0. This is an odd name and type, so rename it to 'is_extref_item' and change its type to bool. Reviewed-by: Boris Burkov <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Signed-off-by: David Sterba <[email protected]>
We have a single transaction abort call that can be due to an error from one of two calls to update_block_group_item(). Unfold the transaction abort calls so that if they happen we know which update_block_group_item() call failed. Signed-off-by: Filipe Manana <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
Currently holes are sent as writes full of zeroes, which results in unnecessarily using disk space at the receiving end and increasing the stream size. In some cases we avoid sending writes of zeroes, like during a full send operation where we just skip writes for holes. But for some cases we fill previous holes with writes of zeroes too, like in this scenario: 1) We have a file with a hole in the range [2M, 3M), we snapshot the subvolume and do a full send. The range [2M, 3M) stays as a hole at the receiver since we skip sending write commands full of zeroes; 2) We punch a hole for the range [3M, 4M) in our file, so that now it has a 2M hole in the range [2M, 4M), and snapshot the subvolume. Now if we do an incremental send, we will send write commands full of zeroes for the range [2M, 4M), removing the hole for [2M, 3M) at the receiver. We could improve cases such as this last one by doing additional comparisons of file extent items (or their absence) between the parent and send snapshots, but that's a lot of code to add plus additional CPU and IO costs. Since the send stream v2 already has a fallocate command and btrfs-progs implements a callback to execute fallocate since the send stream v2 support was added to it, update the kernel to use fallocate for punching holes for V2+ streams. Test coverage is provided by btrfs/284 which is a version of btrfs/007 that exercises send stream v2 instead of v1, using fsstress with random operations and fssum to verify file contents. Link: kdave/btrfs-progs#1001 Reviewed-by: Boris Burkov <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Signed-off-by: David Sterba <[email protected]>
There is a potential deadlock that can happen in try_release_subpage_extent_buffer because the irq-safe xarray spin lock fs_info->buffer_tree is being acquired before the irq-unsafe eb->refs_lock. This leads to the potential race: // T1 (random eb->refs user) // T2 (release folio) spin_lock(&eb->refs_lock); // interrupt end_bbio_meta_write() btrfs_meta_folio_clear_writeback() btree_release_folio() folio_test_writeback() //false try_release_extent_buffer() try_release_subpage_extent_buffer() xa_lock_irq(&fs_info->buffer_tree) spin_lock(&eb->refs_lock); // blocked; held by T1 buffer_tree_clear_mark() xas_lock_irqsave() // blocked; held by T2 I believe that the spin lock can safely be replaced by an rcu_read_lock. The xa_for_each loop does not need the spin lock as it's already internally protected by the rcu_read_lock. The extent buffer is also protected by the rcu_read_lock so it won't be freed before we take the eb->refs_lock and check the ref count. The rcu_read_lock is taken and released every iteration, just like the spin lock, which means we're not protected against concurrent insertions into the xarray. This is fine because we rely on folio->private to detect if there are any eb's remaining in the folio. There is already some precedent for this with find_extent_buffer_nolock, which loads an extent buffer from the xarray with only rcu_read_lock. lockdep warning: ===================================================== WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected 6.16.0-0_fbk701_debug_rc0_123_g4c06e63b9203 #1 Tainted: G E N ----------------------------------------------------- kswapd0/66 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire: ffff000011ffd600 (&eb->refs_lock){+.+.}-{3:3}, at: try_release_extent_buffer+0x18c/0x560 and this task is already holding: ffff0000c1d91b88 (&buffer_xa_class){-.-.}-{3:3}, at: try_release_extent_buffer+0x13c/0x560 which would create a new lock dependency: (&buffer_xa_class){-.-.}-{3:3} -> (&eb->refs_lock){+.+.}-{3:3} but this new dependency connects a HARDIRQ-irq-safe lock: (&buffer_xa_class){-.-.}-{3:3} ... which became HARDIRQ-irq-safe at: lock_acquire+0x178/0x358 _raw_spin_lock_irqsave+0x60/0x88 buffer_tree_clear_mark+0xc4/0x160 end_bbio_meta_write+0x238/0x398 btrfs_bio_end_io+0x1f8/0x330 btrfs_orig_write_end_io+0x1c4/0x2c0 bio_endio+0x63c/0x678 blk_update_request+0x1c4/0xa00 blk_mq_end_request+0x54/0x88 virtblk_request_done+0x124/0x1d0 blk_mq_complete_request+0x84/0xa0 virtblk_done+0x130/0x238 vring_interrupt+0x130/0x288 __handle_irq_event_percpu+0x1e8/0x708 handle_irq_event+0x98/0x1b0 handle_fasteoi_irq+0x264/0x7c0 generic_handle_domain_irq+0xa4/0x108 gic_handle_irq+0x7c/0x1a0 do_interrupt_handler+0xe4/0x148 el1_interrupt+0x30/0x50 el1h_64_irq_handler+0x14/0x20 el1h_64_irq+0x6c/0x70 _raw_spin_unlock_irq+0x38/0x70 __run_timer_base+0xdc/0x5e0 run_timer_softirq+0xa0/0x138 handle_softirqs.llvm.13542289750107964195+0x32c/0xbd0 ____do_softirq.llvm.17674514681856217165+0x18/0x28 call_on_irq_stack+0x24/0x30 __irq_exit_rcu+0x164/0x430 irq_exit_rcu+0x18/0x88 el1_interrupt+0x34/0x50 el1h_64_irq_handler+0x14/0x20 el1h_64_irq+0x6c/0x70 arch_local_irq_enable+0x4/0x8 do_idle+0x1a0/0x3b8 cpu_startup_entry+0x60/0x80 rest_init+0x204/0x228 start_kernel+0x394/0x3f0 __primary_switched+0x8c/0x8958 to a HARDIRQ-irq-unsafe lock: (&eb->refs_lock){+.+.}-{3:3} ... which became HARDIRQ-irq-unsafe at: ... lock_acquire+0x178/0x358 _raw_spin_lock+0x4c/0x68 free_extent_buffer_stale+0x2c/0x170 btrfs_read_sys_array+0x1b0/0x338 open_ctree+0xeb0/0x1df8 btrfs_get_tree+0xb60/0x1110 vfs_get_tree+0x8c/0x250 fc_mount+0x20/0x98 btrfs_get_tree+0x4a4/0x1110 vfs_get_tree+0x8c/0x250 do_new_mount+0x1e0/0x6c0 path_mount+0x4ec/0xa58 __arm64_sys_mount+0x370/0x490 invoke_syscall+0x6c/0x208 el0_svc_common+0x14c/0x1b8 do_el0_svc+0x4c/0x60 el0_svc+0x4c/0x160 el0t_64_sync_handler+0x70/0x100 el0t_64_sync+0x168/0x170 other info that might help us debug this: Possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&eb->refs_lock); local_irq_disable(); lock(&buffer_xa_class); lock(&eb->refs_lock); <Interrupt> lock(&buffer_xa_class); *** DEADLOCK *** 2 locks held by kswapd0/66: #0: ffff800085506e40 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0xe8/0xe50 #1: ffff0000c1d91b88 (&buffer_xa_class){-.-.}-{3:3}, at: try_release_extent_buffer+0x13c/0x560 Link: https://www.kernel.org/doc/Documentation/locking/lockdep-design.rst#:~:text=Multi%2Dlock%20dependency%20rules%3A Signed-off-by: Leo Martins <[email protected]> Reviewed-by: Boris Burkov <[email protected]> Reviewed-by: Qu Wenruo <[email protected]> Fixes: 19d7f65 ("btrfs: convert the buffer_radix to an xarray")
The function cow_file_range() has two boolean parameters, which is never a good thing for eyes. Replace it with a single @flags parameter, with two flags: - COW_FILE_RANGE_NO_INLINE - COW_FILE_RANGE_KEEP_LOCKED And since we're here, also update the comments of cow_file_range() to replace the old "page" usage with "folio". Reviewed-by: Boris Burkov <[email protected]> Signed-off-by: Qu Wenruo <[email protected]>
When hitting a large folio, btrfs_cleanup_ordered_extents() will get the same large folio multiple times, and clearing the same range again and again. Thankfully this is not causing anything wrong, just inefficiency. This is caused by the fact that we're iterating folios using the old page index, thus can hit the same large folio again and again. Enhance it by increasing @index to the index of the folio end, and only increase @index by 1 if we failed to grab a folio. Reviewed-by: Boris Burkov <[email protected]> Signed-off-by: Qu Wenruo <[email protected]>
Inside nocow_one_range(), if the checksum cloning for data reloc inode failed, we call btrfs_cleanup_ordered_extents() to cleanup the just allocated ordered extents. But unlike extent_clear_unlock_delalloc(), btrfs_cleanup_ordered_extents() requires a length, not an inclusive end bytenr. This can be problematic, as the @EnD is normally way larger than @len. This means btrfs_cleanup_ordered_extents() can be called on folios out of the correct range, and if the out-of-range folio is under writeback, we can incorrectly clear the ordered flag of the folio, and trigger the DEBUG_WARN() inside btrfs_writepage_cow_fixup(). Fix the wrong parameter with correct length instead. Fixes: 94f6c5c ("btrfs: move ordered extent cleanup to where they are allocated") Reviewed-by: Boris Burkov <[email protected]> Signed-off-by: Qu Wenruo <[email protected]>
… buffers Currently we only log an error message if we can't find the block group for a log tree extent buffer when unaccounting it (while freeing a log tree). A missing block group means something is seriously wrong and we end up leaking space from the metadata space info. So return -ENOENT in case we don't find the block group. Reviewed-by: Boris Burkov <[email protected]> Reviewed-by: Qu Wenruo <[email protected]> Signed-off-by: Filipe Manana <[email protected]>
We do several things while walking a log tree (for replaying and for freeing a log tree) like reading extent buffers and cleaning them up, but we don't immediately abort the transaction, or turn the fs into an error state, when one of these things fails. Instead we the transaction abort or turn the fs into error state in the caller of the entry point function that walks a log tree - walk_log_tree() - which means we don't get to know exactly where an error came from. Improve on this by doing a transaction abort / turn fs into error state after each such failure so that when it happens we have a better understanding where the failure comes from. This deliberately leaves the transaction abort / turn fs into error state in the callers of walk_log_tree() as to ensure we don't get into an inconsitent state in case we forget to do it deeper in call chain. It also deliberately does not do it after errors from the calls to the callback defined in struct walk_control::process_func(), as we will do it later on another patch. Reviewed-by: Boris Burkov <[email protected]> Reviewed-by: Qu Wenruo <[email protected]> Signed-off-by: Filipe Manana <[email protected]>
…llback In the process_one_buffer() log tree walk callback we return errors to the log tree walk caller and then the caller aborts the transaction, if we have one, or turns the fs into error state if we don't have one. While this reduces code it makes it harder to figure out where exactly an error came from. So add the transaction aborts after every failure inside the process_one_buffer() callback, so that it helps figuring out why failures happen. Reviewed-by: Boris Burkov <[email protected]> Reviewed-by: Qu Wenruo <[email protected]> Signed-off-by: Filipe Manana <[email protected]>
…ffer() Instead of keep dereferencing the walk_control structure to extract the transaction handle whenever is needed, do it once by storing it in a local variable and then use the variable everywhere. This reduces code verbosity and eliminates the need for some split lines. Reviewed-by: Boris Burkov <[email protected]> Reviewed-by: Qu Wenruo <[email protected]> Signed-off-by: Filipe Manana <[email protected]>
…tem() If read_alloc_one_name() we explicitly return -ENOMEM and currently that is fine since it's the only error read_alloc_one_name() can return for now. However this is fragile and not future proof, so return instead what read_alloc_one_name() returned. Reviewed-by: Boris Burkov <[email protected]> Reviewed-by: Qu Wenruo <[email protected]> Signed-off-by: Filipe Manana <[email protected]>
In the replay_one_buffer() log tree walk callback we return errors to the log tree walk caller and then the caller aborts the transaction, if we have one, or turns the fs into error state if we don't have one. While this reduces code it makes it harder to figure out where exactly an error came from. So add the transaction aborts after every failure inside the replay_one_buffer() callback and the functions it calls, making it as fine grained as possible, so that it helps figuring out why failures happen. Reviewed-by: Boris Burkov <[email protected]> Reviewed-by: Qu Wenruo <[email protected]> Signed-off-by: Filipe Manana <[email protected]>
At replay_one_extent(), we can jump to the code that updates the file extent range and updates the inode when processing a file extent item that represents a hole and we don't have the NO_HOLES feature enabled. This helps reduce the high indentation level by one in replay_one_extent() and avoid splitting some lines to make the code easier to read. Reviewed-by: Boris Burkov <[email protected]> Reviewed-by: Qu Wenruo <[email protected]> Signed-off-by: Filipe Manana <[email protected]>
Instead of having an if statement to check for regular and prealloc extents first, process them in a block, and then following with an else statement to check for an inline extent, check for an inline extent first, process it and jump to the 'update_inode' label, allowing us to avoid having the code for processing regular and prealloc extents inside a block, reducing the high indentation level by one and making the code easier to read and avoid line splittings too. Reviewed-by: Boris Burkov <[email protected]> Reviewed-by: Qu Wenruo <[email protected]> Signed-off-by: Filipe Manana <[email protected]>
Instead of extracting again the disk_bytenr and disk_num_bytes values from the file extent item to pass to btrfs_qgroup_trace_extent(), use the key local variable 'ins' which already has those values, reducing the size of the source code. Reviewed-by: Boris Burkov <[email protected]> Reviewed-by: Qu Wenruo <[email protected]> Signed-off-by: Filipe Manana <[email protected]>
There's one only one caller of unaccount_log_buffer() and both this function and the caller are short, so move its code into the caller. Reviewed-by: Boris Burkov <[email protected]> Reviewed-by: Qu Wenruo <[email protected]> Signed-off-by: Filipe Manana <[email protected]>
…ndio_workfn() When btrfs_zone_finish_endio_workfn() is calling btrfs_zone_finish_endio() it already has a pointer to the block group. Furthermore btrfs_zone_finish_endio() does additional checks if the block group can be finished or not. But in the context of btrfs_zone_finish_endio_workfn() only the actual call to do_zone_finish() is of interest, as the skipping condition when there is still room to allocate from the block group cannot be checked. Directly call do_zone_finish() on the block group. Reviewed-by: Damien Le Moal <[email protected]> Signed-off-by: Johannes Thumshirn <[email protected]>
Now that btrfs_zone_finish_endio_workfn() is directly calling do_zone_finish() the only caller of btrfs_zone_finish_endio() is btrfs_finish_one_ordered(). btrfs_finish_one_ordered() already has error handling in-place so btrfs_zone_finish_endio() can return an error if the block group lookup fails. Also as btrfs_zone_finish_endio() already checks for zoned filesystems and returns early, there's no need to do this in the caller. Reviewed-by: Damien Le Moal <[email protected]> Signed-off-by: Johannes Thumshirn <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Keep this open, the build tests are hosted on github CI.