-
Notifications
You must be signed in to change notification settings - Fork 1.9k
zinject: Introduce ready delay fault injection #17787
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine! Does what it says.
I don't have any specific refactoring suggestions outside of the ones I already had for the injection framework as a whole, which is on my "someday maybe" list. This doesn't add to the mess, and its very much "around the edges" code, so I don't mind it.
The one thing that gave me pause is the sleep, which will block a spa_taskq thread for the duration, which can starve other IOs if there's no other task threads available (eg a ZTI_ONE
taskq). In production code that would be a showstopper, but given there's already loads more ways to wedge or panic ZFS from zinject, so one more doesn't bother me.
(the "better" way to do it might be in __zio_execute()
, if its been asked to delay then throw it out to taskq_dispatch_delay()
; basically aping the structure in zio_delay_interrupt()
. Then we'd be able to delay at any stage too. But I don't recommend that route lightly unless you were really in the mood; __zio_execute
is a white-hot codepath and it doesn't not enjoy unexpected state).
4a6243c
to
a8b0275
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice and straight forward, I can see this be handy for future debugging. Can you just address the FreeBSD compiler warnings.
a8b0275
to
c505d0c
Compare
This adds a pause to the ZIO pipeline in the ready stage for matching I/O (data, dnode, or raw bookmark). Signed-off-by: Robert Evans <[email protected]>
c505d0c
to
96562ce
Compare
Good point. #16785 has been here already, so probably worth using a taskq from the start. I reworked this to reuse
Hmm. It seems almost plausible to have the |
Adds delay fault injection to the ZIO pipeline at the ready stage.
This new fault injection allows triggering live/sync races that happen in this particular stage.
Inspired by #16025 and used to reproduce the
free_children
panic.The new mode is rather unlike any other
zinject
tool, so the changes are relatively extensive. I'm open to suggestions for how to tidy this up, refactor, etc.Motivation and Context
New fault injection specifically to trigger live/sync races in the zio pipeline.
Description
Ready delay causes matching ZIO operations to sleep.
-E <seconds>
triggers the new mode.zinject -E 5 -l 2 -t data /path/to/file
for example waits 5 seconds in the ready stage for L2 dbufs that are part of any read or write for the object corresponding to the file.The new mode also accepts ranges and I/O types as further filters and can operate on raw bookmarks or objects.
How Has This Been Tested?
Ran ZTS to verify that there are no regressions in ZFS.
Manually used the new zinject mode to reproduce #16025.
I have not added new tests, but I'm open to suggestions.
Types of changes
Checklist:
Signed-off-by
.