Skip to content

Conversation

@benyamin-codez
Copy link
Contributor

@benyamin-codez benyamin-codez commented Feb 16, 2025

Purpose: To resolve regression introduced in PR #1196 / #1175 (fe07e50).

Background: We previously ignored calls for a spinlock with isr=TRUE in VioScsiVQLock() and VioScsiVQUnlock(). This was replaced with a call to use InterruptLock type spinlocks in the (!IsCrashDumpMode && adaptExt->dpc_ok) = FALSE pathway. In testing, suspend/resume/hibernate did not use this pathway but instead issued DPCs. The InterruptLock was presumed to be used when IsCrashDumpMode=TRUE. Also, using PVOID LockContext = NULL, and / or then setting LockContext to &adaptExt->dpc[vq_req_idx], appears to cause a HCK Flush Test failure.

This preliminary fix proposes to use DpcLevelLock when adaptExt->dpc_ok=TRUE and InterruptLock when adaptExt->dpc_ok=FALSE.

This fix ignores all requests for InterruptLock type spinlocks in ProcessBuffer().

This fix has two components:

  1. Only DpcLock type spinlocks are processed in ProcessBuffer() with the other valid types (InterruptLock and StartIoLock) being ignored ; and
  2. The (PVOID)LockContext is no longer used, with calls to StorPortAcquireSpinLock() for DpcLock type spinlocks using &adaptExt->dpc[vq_req_idx] directly.

Note: Use of InvalidLock requires Win11 and both InvalidLock and DpcLevelLock require using StorPortAcquireSpinLockEx(). Consider for future use.

Added: Created new overloaded enumeration called CUSTOM_STOR_SPINLOCK which adds some new (invalid) spinlock types such as Skip_Locking and No_Lock. Also provides InvalidLock for builds prior to NTDDI_WIN11_GE (Windows 11, version 24H2, build 26100) via Invalid_Lock. In similar vein, Dpc_Lock = DpcLock, StartIo_Lock = StartIoLock, Interrupt_Lock = InterruptLock, ThreadedDpc_Lock = ThreadedDpcLock & DpcLevel_Lock = DpcLevelLock.

Credit to Nutanix and in particular @MartinCHarvey-Nutanix for his work in PR #1293.

@kostyanf14
Copy link
Member

@MartinCHarvey-Nutanix Please look at this fix

@benyamin-codez
Copy link
Contributor Author

@MartinCHarvey-Nutanix Please look at this fix

He's on PTO for a bit.

@benyamin-codez benyamin-codez changed the title [CI-NO-BUILD] Nutanix-ENG-741981 [vioscsi] Fix NMI in crashdump/hibernation pathway Nutanix-ENG-741981 [vioscsi] Fix NMI in crashdump/hibernation pathway Feb 17, 2025
@benyamin-codez benyamin-codez marked this pull request as ready for review February 17, 2025 14:18
@benyamin-codez
Copy link
Contributor Author

@kostyanf14

Can we please rerun tests on this one? I'm confident it should be ready.

@JonKohler
Copy link
Contributor

This looks OK to me. @sb-ntnx can you give it a look?

@benyamin-codez
Copy link
Contributor Author

@kostyanf14

Maybe some HCK-CI/vioscsi checks too?
I'm curious whether any of those failing checks might now pass...

@kostyanf14
Copy link
Member

@kostyanf14

Maybe some HCK-CI/vioscsi checks too? I'm curious whether any of those failing checks might now pass...

Yes, all of them are in the queue

@benyamin-codez
Copy link
Contributor Author

@kostyanf14

Is the Flush Test failure expected in HCK-CI/vioscsi-Win2025x64_host...?

@benyamin-codez
Copy link
Contributor Author

@kostyanf14

Is the Flush Test failure expected in HCK-CI/vioscsi-Win2025x64_host...?

I note that this test passed in PR #1293...
...which - if valid - would mean @MartinCHarvey-Nutanix has found an additional fix beyond the regression...
... not that I can see what that might be.
Maybe he's worked around an upstream (StorPort) bug...?

cc: @JonKohler @sb-ntnx

@benyamin-codez
Copy link
Contributor Author

@JonKohler @sb-ntnx

My latest thoughts on this are:

Perhaps the init of PVOID LockContext = NULL;...
...and then when I later try LockContext = &adaptExt->dpc[vq_req_idx]; ...
...that is actually a problem. 8^O

This was because I used PVOID LockContext = NULL; // sanity check for LockMode = InterruptLock or StartIoLock in my WIP. We can now probably get rid of this in all the places where a DpcLock type spinlock is unconditionally used.
This is what Martin did and the Flush Test seemed to work as expected.

My latest cut (which I just rebased here) and Martin's are getting very close to being the same. I guess I'm preferring mine over minor semantics. I'd like to retain a reference to STOR_SPINLOCK so the spinlock context is considered when calling the ProcessBuffer() function.

Is @MartinCHarvey-Nutanix back from PTO... It would be good to get his view on the differences. Anyone else?

@kostyanf14, could we please re-run the test to be sure?

@benyamin-codez
Copy link
Contributor Author

@kostyanf14, could we please re-run the [HCK-CI/vioscsi-Win2025x64_host] test to be sure?

It will be very telling if it passes...

I note PR #1293 also had:

a) HCK-CI/vioscsi-Win2022x64 fail the Static Tools Logo Test, the Flush Test and Disk Verification (LOGO).
b) HCK-CI/vioscsi-Win11_24H2x64_host_uefi_viommu fail Disk Verification (LOGO).

Both of these also skipped some tests.
Are any of these failures unexpected?

@kostyanf14
Copy link
Member

@benyamin-codez

Static Tools Logo Test unexpected. For some reason SDV crashed for vioscsi driver.
Flush Test and Disk Verification (LOGO) - known issue

@benyamin-codez
Copy link
Contributor Author

@kostyanf14

Static Tools Logo Test unexpected. For some reason SDV crashed for vioscsi driver.
Flush Test and Disk Verification (LOGO) - known issue

Thanks, Kostiantyn.

Just to be clear, was the Flush Test failure expected or unexpected in HCK-CI/vioscsi-Win2025x64_host...?
Can we please re-run it to confirm my theory re using PVOID LockContext = NULL being a problem...?

@kostyanf14
Copy link
Member

Flush Test failure is expected in general. We try to emulate force off of VMs and sometimes HLK is not happy. If the test passed - this is good, if fail - not a bug.

@benyamin-codez
Copy link
Contributor Author

@JonKohler @sb-ntnx @MartinCHarvey-Nutanix @MartinCHarvey

In light of @kostyanf14's comments re the Flush Test above, the fact the same just passed, and a little time to consider other options, I am minded to leave those last edits as-is. The relevant functions in my WIP also had a STOR_SPINLOCK parameter, which I removed when I realised those functions would never use anything but DpcLock type spinlocks moving forward, but I obviously did leave some artefacts behind, which I guess should be cleaned up anyway.

I am now also minded to perhaps create a custom STOR_SPINLOCK-based enumeration, perhaps named CUSTOM_STOR_SPINLOCK, which could then include a Skip_Locking=0 or No_Lock=0 or Invalid_Lock=0 spinlock type for use in this scenario, or which would otherwise mimic the Win11 24H2 enumeration (which uses InvalidLock=0).

Probably something like:

typedef enum _CUSTOM_STOR_SPINLOCK {
    Skip_Locking = 0,
    No_Lock = 0,
#if defined(NTDDI_WIN11_GE) && (NTDDI_VERSION >= NTDDI_WIN11_GE)
    Invalid_Lock = InvalidLock,
#else
    Invalid_Lock = 0,    
#endif
    Dpc_Lock = DpcLock,
    StartIo_Lock = StartIoLock,
    Interrupt_Lock = InterruptLock,
    ThreadedDpc_Lock = ThreadedDpcLock,
    DpcLevel_Lock = DpcLevelLock
} CUSTOM_STOR_SPINLOCK;

I think it is highly likely a ThreadedDpcLock-based solution will be considered as some point in the not-to-distant future. I had some success with this in my WIP (an unpublished variant). For this reason, and as mentioned above, i.e. to help ensure the spinlock context is considered when calling the ProcessBuffer() function, I think we should use a STOR_SPINLOCK-based parameter rather than the BOOLEAN that Martin has used in the other PR.

I'll start putting that together, but in the meantime please do share any thoughts you have about the above.

Also, who discovered the regression?
I would like to give credit to Martin and whomever else might be deserving in the commit message.

… (regression)

Credit to Nutanix and in particular @MartinCHarvey-Nutanix for his work in PR virtio-win#1293.

Background: We previously ignored calls for a spinlock with isr=TRUE in
VioScsiVQLock() and VioScsiVQUnlock(). This was replaced with a call to
InterruptLock in the (!IsCrashDumpMode && adaptExt->dpc_ok) = FALSE pathway.
In testing, suspend/resume/hibernate did not use this pathway but instead
issued DPCs. The InterruptLock was presumed to be used when IsCrashDumpMode=TRUE.
Also, using PVOID LockContext = NULL, and / or then setting LockContext
to &adaptExt->dpc[vq_req_idx], appears to cause a HCK Flush Test failure.

Created new overloaded enumeration called CUSTOM_STOR_SPINLOCK which adds some
new (invalid) spinlock types such as Skip_Locking and No_Lock. Also provides InvalidLock
for builds prior to NTDDI_WIN11_GE (Windows 11, version 24H2, build 26100) via Invalid_Lock.
In similar vein, Dpc_Lock = DpcLock, StartIo_Lock = StartIoLock, Interrupt_Lock =
InterruptLock, ThreadedDpc_Lock = ThreadedDpcLock & ThreadedDpc_Lock = ThreadedDpcLock.

This fix has two components:

1. Only DpcLock type spinlocks are processed in ProcessBuffer() with all other
   types presently being ignored ; and
2. The (PVOID)LockContext is no longer used, with calls to StorPortAcquireSpinLock()
   for DpcLock type spinlocks using &adaptExt->dpc[vq_req_idx] directly.

Note: Use of InvalidLock requires Win11 and both InvalidLock and DpcLevelLock
      require using StorPortAcquireSpinLockEx. Consider for future use.

Signed-off-by: benyamin-codez <[email protected]>
@benyamin-codez
Copy link
Contributor Author

@JonKohler @sb-ntnx @MartinCHarvey-Nutanix @MartinCHarvey

Please find my new cut, crafted and pushed per the above post, for your consideration.

@kostyanf14

Maybe we should try HCK-CI/vioscsi-Win2022x64 this round. I had no SDV failures on my end.

@benyamin-codez
Copy link
Contributor Author

benyamin-codez commented Feb 22, 2025

P.S. :
Small boo-boo in commit message:
Duplicated ThreadedDpc_Lock = ThreadedDpcLock, one should be DpcLevel_Lock = DpcLevelLock... 8^d
Will fix in next rebase...

@benyamin-codez
Copy link
Contributor Author

@kostyanf14

Flush Test failure is expected in general. We try to emulate force off of VMs and sometimes HLK is not happy. If the test passed - this is good, if fail - not a bug.

wrt your power-loss emulation code for the Flush Test, is that open for peeking?
I'd imagine you have a QEMU-based solution, so maybe:
qm stop <vmid> --overrule-shutdown 1 --timeout 0
I was curious whether you might also be using --keepActive 1 or even --skiplock 1 (as su).
...or do you just use kill -9 <vm_pid>...?

@benyamin-codez
Copy link
Contributor Author

@kostyanf14

Apologies for clobbering the previous HCK-CI/vioscsi-Win2025x64_host check...!! 8^d

@kostyanf14
Copy link
Member

@kostyanf14

Flush Test failure is expected in general. We try to emulate force off of VMs and sometimes HLK is not happy. If the test passed - this is good, if fail - not a bug.

wrt your power-loss emulation code for the Flush Test, is that open for peeking? I'd imagine you have a QEMU-based solution, so maybe: qm stop <vmid> --overrule-shutdown 1 --timeout 0 I was curious whether you might also be using --keepActive 1 or even --skiplock 1 (as su). ...or do you just use kill -9 <vm_pid>...?

Please see https://github.com/HCK-CI/AutoHCK/ in general and HCK-CI/AutoHCK#486 specific for this test

@benyamin-codez
Copy link
Contributor Author

@kostyanf14

Looks like HCK-CI/vioscsi-Win2025x64_host passed Flush Test again but failed Disk Verification (LOGO) this time...

Maybe we should try HCK-CI/vioscsi-Win2022x64 this round. I had no SDV failures on my end.

Ready for this one?

@MartinCHarvey
Copy link

MartinCHarvey commented Feb 23, 2025 via email

@benyamin-codez
Copy link
Contributor Author

@MartinCHarvey-Nutanix @MartinCHarvey @sb-ntnx

Thanks for the post, Martin.

Have we addressed the case where this happens at very initial driver load and device start?

I have been unable to reproduce this issue, so I have presumed you are using something funky in your setup...
We did test per comments from here in PR #1175 (which later became PR #1196 where the regression was introduced).
Do you have a setup not covered in those comments?

My understanding is that the ISR gets called after initial device addition, but before the DPC has been set up.

ISR = VioScsiCompleteDpcRoutine() which calls ProcessBuffer() with a DpcLock type spinlock.

iirc, the codepath for setting it up is:

VioScsiHwInitialize() [HW_INITIALIZE following VioScsiFindAdapter()]
...
... StorPortEnablePassiveInitialization(DeviceExtension, VioScsiPassiveInitializeRoutine)
...
VioScsiPassiveInitializeRoutine() [HW_PASSIVE_INITIALIZE_ROUTINE following HW_INITIALIZE]

Where:
VioScsiPassiveInitializeRoutine()
... calls StorPortInitializeDpc() for each request queue...
... to create StorPort DPC objects [STOR_DPC (Buffer, Lock)] at each pointer adaptExt->dpc[vq_idx]...
... associating the ISR [VioScsiCompleteDpcRoutine()] with each StorPort DPC object...
... and then blindly sets adaptExt->dpc_ok = TRUE (as StorPortInitializeDpc() does not return a status).

Perhaps adaptExt->dpc_ready would be a mnemonic improvement upon adaptExt->dpc_ok.

If StorPortEnablePassiveInitialization() returns FALSE, i.e. the system does not support DPCs, then the VioScsiPassiveInitializeRoutine() routine will never be called and adaptExt->dpc_ok will remain FALSE. The trouble with such a scenario is that we rely on DPCs and DpcLock everywhere else, so I'm not sure how the driver would work.

Is this the case with your setup? I ask because the DPC infrastructure should be well and truly set up, by the time the VioScsiMSInterrupt() is called by StorPort following VioScsiStartIo(). Why else would you hit this codepath...?

In the case DPCs are supported, and we happen to use a STOR_SPINLOCK type other thanDpcLock, we will clobber any DPCs/DpcLock spinlocks, as the new spinlock will be issued above DISPATCH_LEVEL. In my testing, using a different spinlock type usually caused a BSOD (typ. IRQL_NOT_LESS_OR_EQUAL), and not a "hung" VM...

So historically, we have skipped the spinlock. This PR restores this previous behaviour.

Please let me know if you can still reproduce your issue even with this PR.
It is possible we might need [in DispatchQueue()]:

    if (!adaptExt->dump_mode && adaptExt->dpc_ok)
    {
        NT_ASSERT(MessageId >= QUEUE_TO_MESSAGE(VIRTIO_SCSI_REQUEST_QUEUE_0));
        StorPortIssueDpc(DeviceExtension,
                         &adaptExt->dpc[MessageId - QUEUE_TO_MESSAGE(VIRTIO_SCSI_REQUEST_QUEUE_0)],
                         ULongToPtr(MessageId),
                         ULongToPtr(MessageId));
    }
    else if (adaptExt->dump_mode)
    {
        ProcessBuffer(DeviceExtension, MessageId, Skip_Locking);
    }
    else
    {
        ProcessBuffer(DeviceExtension, MessageId, Interrupt_Lock);
    }

...or some other variant + code in ProcessBuffer() for InterruptLock or other spinlock types... 8^d

benyamin-codez referenced this pull request Feb 24, 2025
… (regression)

Credit to Nutanix and in particular @MartinCHarvey-Nutanix for his work in PR #1293.

Background: We previously ignored calls for a spinlock with isr=TRUE in
VioScsiVQLock() and VioScsiVQUnlock(). This was replaced with a call to
InterruptLock in the (!IsCrashDumpMode && adaptExt->dpc_ok) = FALSE pathway.
In testing, suspend/resume/hibernate did not use this pathway but instead
issued DPCs. The InterruptLock was presumed to be used when IsCrashDumpMode=TRUE.
Also, using PVOID LockContext = NULL, and / or then setting LockContext
to &adaptExt->dpc[vq_req_idx], appears to cause a HCK Flush Test failure.

Created new overloaded enumeration called CUSTOM_STOR_SPINLOCK which adds some
new (invalid) spinlock types such as Skip_Locking and No_Lock. Also provides InvalidLock
for builds prior to NTDDI_WIN11_GE (Windows 11, version 24H2, build 26100) via Invalid_Lock.
In similar vein, Dpc_Lock = DpcLock, StartIo_Lock = StartIoLock, Interrupt_Lock =
InterruptLock, ThreadedDpc_Lock = ThreadedDpcLock & ThreadedDpc_Lock = ThreadedDpcLock.

This fix has two components:

1. Only DpcLock type spinlocks are processed in ProcessBuffer() with all other
   types presently being ignored ; and
2. The (PVOID)LockContext is no longer used, with calls to StorPortAcquireSpinLock()
   for DpcLock type spinlocks using &adaptExt->dpc[vq_req_idx] directly.

Note: Use of InvalidLock requires Win11 and both InvalidLock and DpcLevelLock
      require using StorPortAcquireSpinLockEx. Consider for future use.

Signed-off-by: benyamin-codez <[email protected]>
@benyamin-codez
Copy link
Contributor Author

@MartinCHarvey-Nutanix @MartinCHarvey @sb-ntnx

The one variant I didn't try was:

    if (!adaptExt->dump_mode && adaptExt->dpc_ok)
    {
        ...
    }
    else
    {
        ProcessBuffer(DeviceExtension, MessageId, Dpc_Lock);
    }

..., which in some circumstances, might actually work...

@MartinCHarvey-Nutanix
Copy link

@MartinCHarvey-Nutanix @MartinCHarvey @sb-ntnx

I have been unable to reproduce this issue, so I have presumed you are using something funky in your setup...

I suspect this may be the case. I'm testing the case where there is a totally "bare" windows VM, which has a vioscsi device, and additionally (I guess), there are pending events / Io's / something on the device at the moment where a driver is very first installed.

Do you have a setup not covered in those comments?

Possibly. Nutanix runs storage stuff on the host which is not "vanilla upstream".

My understanding is that the ISR gets called after initial device addition, but before the DPC has been set up.

Perhaps adaptExt->dpc_ready would be a mnemonic improvement upon adaptExt->dpc_ok.

Yes, dpc_ok strikes me as a bit ambigous. I think a bit of instrumentation with some dpc_state variable might be a good idea.

If StorPortEnablePassiveInitialization() returns FALSE, i.e. the system does not support DPCs

I'm willing to believe this generally shouldn't happen in most/all normal use cases. Possibly the dump/hiber path doesn't allow DPC's in the normal case. I'll get back to you on that.

Is this the case with your setup? I ask because the DPC infrastructure should be well and truly set up, by the time the VioScsiMSInterrupt() is called by StorPort following VioScsiStartIo(). Why else would you hit this codepath...?

Hmmm. I notice there's a "virtio_device_ready" at the end of HwInitialize, but before passive initialization has completed, and strictly speaking (I haven't checked the virtio spec) we might not want to indicate fully ready until passive init has succeded or failed. Maybe some interrupt not strictly associated with a vm/client side request?

Please let me know if you can still reproduce your issue even with this PR. It is possible we might need [in DispatchQueue()]:

Hmmm. OK. Some concrete evidence and instrumentation / tracing is required. Leave it with me, and I'll get back to you in the next day or two on that.

@MartinCHarvey-Nutanix
Copy link

@benyamin-codez

OK, suspicions confirmed. I added a pending flag or two and a bit of debug. Here are my much abbreviated logs.

[VioScsiHwInitialize] DEBUG: Queues 4 msix_vectors 7
[VioScsiHwInitialize] DEBUG: Using a unique MSI vector per queue
...
[VioScsiHwInitialize] Processing complete.
[VioScsiMSInterruptWorker] DEBUG: MessageID 0x4
...
[VioScsiMSInterruptWorker] DEBUG:DPC pend until setup.
...
[VioScsiPassiveInitializeRoutine] X---X Working X---X
[VioScsiPassiveInitializeRoutine] DEBUG: Interrupt before passive initialize complete, kicking all queues

I think it may be time to move that virtio_device_ready call to the end of passive initialization. I'll do that, check it all works in the normal case, then check the crashdump / hibernate case, then remove all my debug, and then update my PR :-)

@MartinCHarvey-Nutanix
Copy link

MartinCHarvey-Nutanix commented Feb 25, 2025

@benyamin-codez These issues also apply to virtio-stor. However, I think this is slightly overkill, because:

Stuff done under a dpc lock is unlikely to be under any other sort of lock.
Unless we have a burning need to move stuff into threaded dpc's...
If we did want to make these drivers more pre-emptible, we could just move more of the processing into workitems, which I think is not necessary at the moment.

So I think it would be better to fix the (small) sync issues with a small code change, and make that change common between virtio-scsi and virtio-stor.

@benyamin-codez
Copy link
Contributor Author

@MartinCHarvey-Nutanix

Thanks for the updates..!

I started getting DRIVER_IRQL_NOT_LESS_OR_EQUAL in ntoskrnl.exe at driver installation (in my vioscsi WIP) a few hours back following clang-format changes... 8^d

It's perhaps unrelated, but I remain curious as to why your environment is calling:
VioScsiMSInterrupt() [HW_MESSAGE_SIGNALED_INTERRUPT_ROUTINE]
...before virtio_device_ready(), or even before VioScsiStartIo() for that matter.

Did you find a cause for why the MSI is being raised so early...?

...move that virtio_device_ready call to the end of passive initialization.

In the event that passive initialisation doesn't occur, i.e. adaptExt->dpc_ok = FALSE, how do we ensure the virtio_device_ready() routine still runs?

@benyamin-codez
Copy link
Contributor Author

@MartinCHarvey-Nutanix

I started getting DRIVER_IRQL_NOT_LESS_OR_EQUAL in ntoskrnl.exe at driver installation (in my vioscsi WIP) a few hours back following clang-format changes... 8^d

It's perhaps unrelated...

It was unrelated...
... 8^d
... ... 8^D

Any joy moving that virtio_device_ready() routine...?

I was thinking it might be the case that it is required where it is, ...
...i.e. before StorPort calls HW_PASSIVE_INITIALIZE_ROUTINE (i.e. VioScsiPassiveInitializeRoutine())
...i.e. in HW_INITIALIZE (i.e. VioScsiHwInitialize())...

...so I'm very curious to see where you might call it from, and if that works.

@MartinCHarvey-Nutanix
Copy link

MartinCHarvey-Nutanix commented Feb 26, 2025

@benyamin-codez Give me a mo to retest both normal usage and crashdump. I'll update my PR with a couple of commits, so you can cherry-pick one of them across.

@MartinCHarvey
Copy link

MartinCHarvey commented Feb 26, 2025 via email

@benyamin-codez
Copy link
Contributor Author

FYI, I'm prepping to close this in favour of Martin's PR #1293.
Refactoring for same and Just doing some testing to be sure...

@benyamin-codez
Copy link
Contributor Author

Closing this PR in favour of #1293.
Thanks for the collab.
Apologies it was necessary.
q8^{d>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants