Skip to content

zvol: blk-mq sync isn't working correctly #17761

@tonyhutter

Description

@tonyhutter

System information

Type Version/Name
Distribution Name Fedora
Distribution Version 42
Kernel Version 6.14.6
Architecture x86_64
OpenZFS Version master (d147ed7)

Describe the problem you're observing

A user noticed that blk-mq ZVOL writes were always async and were not going to the SLOG, even after a flush. This did not happen when using the BIO ZVOL write codepath (which is the default). This was initially thought to be due to the lack of calling blk_queue_write_cache() (as reported in #17698) but further investigation showed this was not the case, as the "volatile write cache" flags we're being correctly set in zvol_queue_limits_convert()/zvol_queue_limits_apply().

The real problem was that many FLUSH/FUA and TRIM requests were incorrectly going down the read codepath instead of the write codepath (although there were some FUAs that were correctly marked as writes). This was because requests not marked as write requests were assumed to be read requests. Thus, FLUSH and TRIM commands, which are neither reads nor writes, were not handled correctly. I believe this happened because the blk-mq codepaths copied the bio codepaths, and the BIOs with FLUSH/FUA/TRIM bits set may have been marked as writes.

Describe how to reproduce the problem

Create a 2GB pool with a log device. Set zvol_use_blk_mq=1 module param. Create a zvol with compression=off. Then write to the zvol synchronously/asynchronously:

# Async
sudo fio --name=write_iops --size=1G --time_based --runtime=10s --ramp_time=0s --ioengine=libaio --verify=0 --bs=4K --iodepth=64 --rw=randwrite --group_reporting=1 --filename=/dev/zd0
# Sync
sudo fio --name=write_iops --size=1G --time_based --runtime=10s --ramp_time=0s --ioengine=libaio --verify=0 --bs=4K --iodepth=64 --rw=randwrite --group_reporting=1 --filename=/dev/zd0 --sync=1

Run zpool iostat -v. In the async case you should see virtually no writes go to the log device while fio is running. In the sync case, you should see the log being written while fio is running.

Include any warning/errors/backtraces from the system logs

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type: DefectIncorrect behavior (e.g. crash, hang)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions