ddt prune: Add SCL_ZIO deadlock workaround #17793
Open
+29
−8
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Motivation and Context
The situation seems to be mirror related only and it happens if one logic path takes
SCL_ZIO
reader lock, then another one landsscl_write_wanted
by waiting forSCL_ZIO
writer lock, and the first one asks forSCL_ZIO
reader lock again.Currently, pruning code path has only two entry points:
zpool prune
CLI andztest_ddt_prune()
test. Both of them callddt_prune_unique_entries()
, which begins and ends the pruning process by switching thespa->spa_active_ddt_prune
bool flag.The following is the actual example of the case happened with ztest. The paragraph number depicts the sequence of events.
1
ztest_ddt_prune()
test running in a separate ztest thread enqueuesdsl_sync_task(prune_candidates_sync)
, and keeps waiting for a txg_sync_thread to get it done.2 The
txg_sync_thread()
runsdsl_pool_sync()
which invokesprune_candidates_sync
sync task. Theprune_candidates_sync()
takesSCL_ZIO
reader lock before the actual work.3 Another thread asks and waits for
SCL_ZIO
writer lock viaspa_vdev_state_enter()
, in this case it wasztest_scrub
. The lock getsscl_write_wanted++
.4 The
txg_sync_thread()
continues runningprune_candidates_sync()
, eventually it hitszio_vdev_io_start()
which decides to takeSCL_ZIO
reader lock. And havingscl_write_wanted > 0
it is not going to happen, as theztest_scrub
thread actually waits for the pruning process to finish and freeSCL_ZIO
reader lock.Description
The
mmp_flag
ofspa_config_enter_impl()
is used to ignore pending write locks. And the condition for this isspa->spa_spa_active_ddt_prune
flag.As long as such change makes
spa_config_enter_mmp()
function have general application it is proposed to rename it tospa_config_enter_priority()
.How Has This Been Tested?
Types of changes
Checklist:
Signed-off-by
.