Skip to content

Conversation

@MartinDrab
Copy link
Contributor

@MartinDrab MartinDrab commented Oct 12, 2025

When the QEMU Guest Agent (QGA) is active, it establishes a connection to the Virtio serial devices by opening handles to them via CreateFile. The handles are kept open basically until the QGA terminates. This causes problems when the user wishes to either disable the serial device(s), or update the Virtio serial driver – due to open handles, these actions require computer restart (or QGA service stop).

This PR addresses the driver part, consisting of letting the QGA service know that it should close the handles at least temporarily. I hope to modify QGA code as well.

This PR adds a QueryRemove PnP event handler. When the PnP Manager asks the driver for its permission to remove one of its serial devices, the driver completes the pending read request (always sent by QGA) with *TATUS_HANDLE_EOF. Currently, QGA reacts by sending the read request immediatelly again. I plan to modify the QGA to react by temporarily closing the handles and reopening them again after a while (without changing any other properties of the channel between QGA and the devices). That should allow serial device disabling and driver updates without requring a restart.

@gemini-code-assist
Copy link

Summary of Changes

Hello @MartinDrab, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the Virtio serial driver's ability to cooperate with the QEMU Guest Agent (QGA) during device removal operations. By introducing specific Plug and Play (PnP) event handlers, the driver can now intelligently respond to requests for device removal or unexpected removals. This prevents scenarios where QGA's open handles block device operations, ultimately leading to a more stable and responsive virtual machine environment, especially during hotplugging or driver maintenance.

Highlights

  • New PnP Event Handlers: Implemented EvtDeviceQueryRemove and EvtDeviceSurpriseRemoval callbacks within the Virtio serial driver to improve handling of device removal events.
  • Graceful Pending Read Completion: Introduced a new helper function, VIOSerialPortCompletePendingRead, to safely complete any pending read requests from the QEMU Guest Agent (QGA) during device removal, preventing potential deadlocks or system hangs.
  • QueryRemove Behavior: When a QueryRemove event occurs, the driver now completes pending read requests with STATUS_END_OF_FILE. This is intended to signal QGA to temporarily close and reopen its handles, facilitating driver updates and device disabling without requiring a VM restart.
  • SurpriseRemoval Behavior: For SurpriseRemoval events, pending read requests are completed with STATUS_DELETE_PENDING. This informs QGA of an error, prompting it to close handles and await the device's re-enumeration, thus preventing system hangs during forced device removal.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@MartinDrab MartinDrab marked this pull request as draft October 12, 2025 21:56
@YanVugenfirer
Copy link
Collaborator

[Jenkins CI]: Can one of the admins verify this patch?

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces handlers for device removal PnP events to improve cooperation with the QEMU Guest Agent. The changes are logical and well-aligned with the goal. I've identified a critical deadlock issue in the new helper function and a couple of minor issues regarding coding standards. Addressing these will improve the robustness and maintainability of the code.

Comment on lines 1278 to 1289
WdfSpinLockAcquire(Port->InBufLock);
readRequest = Port->PendingReadRequest;
Port->PendingReadRequest = NULL;
if (readRequest != NULL)
{
if (WdfRequestUnmarkCancelable(readRequest) != STATUS_CANCELLED)
{
WdfRequestCompleteWithInformation(readRequest, Status, Information);
}
}

WdfSpinLockRelease(Port->InBufLock);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Calling WdfRequestUnmarkCancelable while holding a spinlock can lead to a deadlock. The EvtRequestCancel callback for this request (VIOSerialPortReadRequestCancel) also acquires Port->InBufLock. If the cancel callback is invoked concurrently while this lock is held, a deadlock will occur.

To fix this, you should release the spinlock before calling WdfRequestUnmarkCancelable and WdfRequestCompleteWithInformation. The spinlock should only be held for the minimal time required to protect access to the shared data (Port->PendingReadRequest).

    WdfSpinLockAcquire(Port->InBufLock);
    readRequest = Port->PendingReadRequest;
    Port->PendingReadRequest = NULL;
    WdfSpinLockRelease(Port->InBufLock);

    if (readRequest != NULL)
    {
        if (WdfRequestUnmarkCancelable(readRequest) != STATUS_CANCELLED)
        {
            WdfRequestCompleteWithInformation(readRequest, Status, Information);
        }
    }


NTSTATUS VIOSerialPortEvtDeviceQueryRemove(WDFDEVICE Device)
{
PRAWPDO_VIOSERIAL_PORT pdoData = RawPdoSerialPortGetData(Device);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This function is marked as pageable with #pragma alloc_text(PAGE, VIOSerialPortEvtDeviceQueryRemove). It's a best practice and often required by static analysis tools to add the PAGED_CODE() macro at the beginning of pageable functions to ensure they are not called at a raised IRQL.

    PAGED_CODE();
    PRAWPDO_VIOSERIAL_PORT pdoData = RawPdoSerialPortGetData(Device);


VOID VIOSerialPortEvtDeviceSurpriseRemoval(WDFDEVICE Device)
{
PRAWPDO_VIOSERIAL_PORT pdoData = RawPdoSerialPortGetData(Device);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This function is marked as pageable with #pragma alloc_text(PAGE, VIOSerialPortEvtDeviceSurpriseRemoval). It's a best practice and often required by static analysis tools to add the PAGED_CODE() macro at the beginning of pageable functions to ensure they are not called at a raised IRQL.

    PAGED_CODE();
    PRAWPDO_VIOSERIAL_PORT pdoData = RawPdoSerialPortGetData(Device);

@kostyanf14
Copy link
Member

ok to test

@MartinDrab MartinDrab force-pushed the feature/vioser/query-remove branch from 528af12 to 5adf81d Compare October 13, 2025 21:50
@YanVugenfirer
Copy link
Collaborator

Hi @MartinDrab, what else do you want to do with this PR before removing WIP?

@MartinDrab
Copy link
Contributor Author

Hi @MartinDrab, what else do you want to do with this PR before removing WIP?

Hi Yan,

I did some more testing and it looks like there is no need to actually add the SurpriseRemoval handler and that the main issue lies in that QGA just does not close handles to the serial port when it detects an error. So, I will probably remove the commit introducing the SurpriseRemoval handler (and update this PR accordingly).

But I will do some more testing to be sure.

When QGA detects a Virtio serial device exists, it automatically opens a handle to it and submits a read request to receive future data from the host. This connection basically exists as long as the QGA executable is running. Unfortunately, the open handle and pending read request prevents the driver and its devices from removal (device disabling, driver update) – the device remains stuck in its removal phase until the handle gets closed (meaning forever).

This commit enhances the driver with a QueryRemove event handler. The handler is invoked when the PnP Manager asks whether the device can be removed. The handler completes the pending read request with STATUS_END_OF_FILE which alerts the QGA service. Currently, QGA just resubmits the read request again, however, its code will be modified to close all handles to Virtio serial devices for a while in order to allow the devices to get removed. That means, adding the QueryRemove handler does not change functionality of QGA.

This commit does not affect temporary device stop due to hardware resource reassignment since that is already working fine.

Signed-Off-By: Martin Drab <[email protected]>
@MartinDrab MartinDrab force-pushed the feature/vioser/query-remove branch from 5adf81d to f2cc301 Compare October 16, 2025 14:10
@MartinDrab MartinDrab marked this pull request as ready for review October 16, 2025 14:19
@kostyanf14 kostyanf14 changed the title WIP: Vioser: Better cooperation with QGA during device removals Vioser: Better cooperation with QGA during device removals Oct 16, 2025
Copy link
Member

@kostyanf14 kostyanf14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MartinDrab

Do you plan to submit any patches to QGA?

@kostyanf14
Copy link
Member

@MartinDrab

Do you plan to submit any patches to QGA?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants