-
Notifications
You must be signed in to change notification settings - Fork 427
Vioser: Better cooperation with QGA during device removals #1435
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Vioser: Better cooperation with QGA during device removals #1435
Conversation
Summary of ChangesHello @MartinDrab, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the Virtio serial driver's ability to cooperate with the QEMU Guest Agent (QGA) during device removal operations. By introducing specific Plug and Play (PnP) event handlers, the driver can now intelligently respond to requests for device removal or unexpected removals. This prevents scenarios where QGA's open handles block device operations, ultimately leading to a more stable and responsive virtual machine environment, especially during hotplugging or driver maintenance. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
[Jenkins CI]: Can one of the admins verify this patch? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces handlers for device removal PnP events to improve cooperation with the QEMU Guest Agent. The changes are logical and well-aligned with the goal. I've identified a critical deadlock issue in the new helper function and a couple of minor issues regarding coding standards. Addressing these will improve the robustness and maintainability of the code.
vioserial/sys/Port.c
Outdated
| WdfSpinLockAcquire(Port->InBufLock); | ||
| readRequest = Port->PendingReadRequest; | ||
| Port->PendingReadRequest = NULL; | ||
| if (readRequest != NULL) | ||
| { | ||
| if (WdfRequestUnmarkCancelable(readRequest) != STATUS_CANCELLED) | ||
| { | ||
| WdfRequestCompleteWithInformation(readRequest, Status, Information); | ||
| } | ||
| } | ||
|
|
||
| WdfSpinLockRelease(Port->InBufLock); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling WdfRequestUnmarkCancelable while holding a spinlock can lead to a deadlock. The EvtRequestCancel callback for this request (VIOSerialPortReadRequestCancel) also acquires Port->InBufLock. If the cancel callback is invoked concurrently while this lock is held, a deadlock will occur.
To fix this, you should release the spinlock before calling WdfRequestUnmarkCancelable and WdfRequestCompleteWithInformation. The spinlock should only be held for the minimal time required to protect access to the shared data (Port->PendingReadRequest).
WdfSpinLockAcquire(Port->InBufLock);
readRequest = Port->PendingReadRequest;
Port->PendingReadRequest = NULL;
WdfSpinLockRelease(Port->InBufLock);
if (readRequest != NULL)
{
if (WdfRequestUnmarkCancelable(readRequest) != STATUS_CANCELLED)
{
WdfRequestCompleteWithInformation(readRequest, Status, Information);
}
}
vioserial/sys/Port.c
Outdated
|
|
||
| NTSTATUS VIOSerialPortEvtDeviceQueryRemove(WDFDEVICE Device) | ||
| { | ||
| PRAWPDO_VIOSERIAL_PORT pdoData = RawPdoSerialPortGetData(Device); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is marked as pageable with #pragma alloc_text(PAGE, VIOSerialPortEvtDeviceQueryRemove). It's a best practice and often required by static analysis tools to add the PAGED_CODE() macro at the beginning of pageable functions to ensure they are not called at a raised IRQL.
PAGED_CODE();
PRAWPDO_VIOSERIAL_PORT pdoData = RawPdoSerialPortGetData(Device);
vioserial/sys/Port.c
Outdated
|
|
||
| VOID VIOSerialPortEvtDeviceSurpriseRemoval(WDFDEVICE Device) | ||
| { | ||
| PRAWPDO_VIOSERIAL_PORT pdoData = RawPdoSerialPortGetData(Device); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is marked as pageable with #pragma alloc_text(PAGE, VIOSerialPortEvtDeviceSurpriseRemoval). It's a best practice and often required by static analysis tools to add the PAGED_CODE() macro at the beginning of pageable functions to ensure they are not called at a raised IRQL.
PAGED_CODE();
PRAWPDO_VIOSERIAL_PORT pdoData = RawPdoSerialPortGetData(Device);|
ok to test |
528af12 to
5adf81d
Compare
|
Hi @MartinDrab, what else do you want to do with this PR before removing WIP? |
Hi Yan, I did some more testing and it looks like there is no need to actually add the SurpriseRemoval handler and that the main issue lies in that QGA just does not close handles to the serial port when it detects an error. So, I will probably remove the commit introducing the SurpriseRemoval handler (and update this PR accordingly). But I will do some more testing to be sure. |
When QGA detects a Virtio serial device exists, it automatically opens a handle to it and submits a read request to receive future data from the host. This connection basically exists as long as the QGA executable is running. Unfortunately, the open handle and pending read request prevents the driver and its devices from removal (device disabling, driver update) – the device remains stuck in its removal phase until the handle gets closed (meaning forever). This commit enhances the driver with a QueryRemove event handler. The handler is invoked when the PnP Manager asks whether the device can be removed. The handler completes the pending read request with STATUS_END_OF_FILE which alerts the QGA service. Currently, QGA just resubmits the read request again, however, its code will be modified to close all handles to Virtio serial devices for a while in order to allow the devices to get removed. That means, adding the QueryRemove handler does not change functionality of QGA. This commit does not affect temporary device stop due to hardware resource reassignment since that is already working fine. Signed-Off-By: Martin Drab <[email protected]>
5adf81d to
f2cc301
Compare
kostyanf14
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you plan to submit any patches to QGA?
|
Do you plan to submit any patches to QGA? |
When the QEMU Guest Agent (QGA) is active, it establishes a connection to the Virtio serial devices by opening handles to them via
CreateFile. The handles are kept open basically until the QGA terminates. This causes problems when the user wishes to either disable the serial device(s), or update the Virtio serial driver – due to open handles, these actions require computer restart (or QGA service stop).This PR addresses the driver part, consisting of letting the QGA service know that it should close the handles at least temporarily. I hope to modify QGA code as well.
This PR adds a QueryRemove PnP event handler. When the PnP Manager asks the driver for its permission to remove one of its serial devices, the driver completes the pending read request (always sent by QGA) with *
TATUS_HANDLE_EOF. Currently, QGA reacts by sending the read request immediatelly again. I plan to modify the QGA to react by temporarily closing the handles and reopening them again after a while (without changing any other properties of the channel between QGA and the devices). That should allow serial device disabling and driver updates without requring a restart.