Skip to content

munet: add Commander.async_spawn() #58

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

liambrady
Copy link
Contributor

@liambrady liambrady commented May 21, 2025

spawn(), which conducts a send/expect process is
blocking due to the abscence of async_=True to
pexpect.expect() within the method's main
send/expect loop. Some async methods call
spawn(), however, which leads to such methods
being blocked.

This is particularly problematic in cases where
multiple VMs requiring custom console expect/send
prompts are required (e.g. routers). Since each
QEMU VM sets up both console logging and completes
an expect/send loop without yielding control to
the coroutine of the other QEMU VMs, only one VM
can ever have logging configured (and thus advance in
an expect/send loop) at any given time. Not
only is this inefficient given that all QEMU VMs
are running and waiting for input, but this can
be fatal if console logging is not set up on each
VM in time. If some mandatory console output is
missed, then there is no way to recover and the
setup of the QEMU VM will time out.

However, setting async_=True in pexpect.expect()
leads to errors when mixed with PopenSpawn.
To bypass this issue, we do not use pexpect.expect()
with async_=True, but repeatedly call
pexpect.expect() with a timeout set at 0.1 (Until
a timeout that we track expires).

The end result of this change is that both logging is set
up on all QEMU nodes immediately and that
independent progress can be made in each VM's
expect/send loop simultaneously.

Copy link

codecov bot commented May 21, 2025

Codecov Report

Attention: Patch coverage is 90.14085% with 7 lines in your changes missing coverage. Please review.

Project coverage is 59.28%. Comparing base (4ec7fe7) to head (fc1a560).
Report is 25 commits behind head on main.

Files with missing lines Patch % Lines
munet/base.py 89.70% 7 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #58      +/-   ##
==========================================
+ Coverage   58.84%   59.28%   +0.44%     
==========================================
  Files          19       19              
  Lines        5729     5802      +73     
==========================================
+ Hits         3371     3440      +69     
- Misses       2358     2362       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@liambrady liambrady requested a review from choppsv1 May 21, 2025 16:53
@liambrady liambrady marked this pull request as ready for review May 21, 2025 16:53
@choppsv1
Copy link
Collaborator

As discussed elsewhere, we need to use a new async_spawn() variant for this to preserve the existing non-asyncio API. The use case you're interested in is coming from an async function so should be fine to just switch to the new variant inside console/shell_spawn.

Also, since async/await only fails with non-pty pexpect.popen_spawn (i.e., pipes) you should restrict the busy loop fix to only that case. For use_pty == True use the supported async from pexpect.

spawn(), which conducts a send/expect process is
blocking due to the abscence of async_=True to
pexpect.expect() within the method's main
send/expect loop. Some async methods call
spawn(), however, which leads to such methods
being blocked.

This is particularly problematic in cases where
multiple VMs requiring custom console expect/send
prompts are required (e.g. routers). Since each
QEMU VM sets up both console logging and completes
an expect/send loop without yielding control to
the coroutine of the other QEMU VMs, only one VM
can ever have logging configured (and thus advance in
an expect/send loop) at any given time. Not
only is this inefficient given that all QEMU VMs
are running and waiting for input, but this can
be fatal if console logging is not set up on each
VM in time. If some mandatory console output is
missed, then there is no way to recover and the
setup of the QEMU VM will time out.

However, setting async_=True in pexpect.expect()
leads to errors when mixed with PopenSpawn.
To bypass this issue, we do not use pexpect.expect()
with async_=True, but repeatedly call
pexpect.expect() with a timeout set at 0.1 (Until
a timeout that we track expires).

The end result of this change is that both logging is set
up on all QEMU nodes immediately and that
independent progress can be made in each VM's
expect/send loop simultaneously.

Signed-off-by: Liam Brady <[email protected]>
@liambrady liambrady force-pushed the liambrady/qemu_pexpect_nonblocking branch from e3b5d2c to a848814 Compare May 26, 2025 21:10
@liambrady liambrady changed the title munet: make Commander.spawn() async munet: add Commander.asynnc_spawn() May 26, 2025
@liambrady liambrady changed the title munet: add Commander.asynnc_spawn() munet: add Commander.async_spawn() May 26, 2025
@liambrady
Copy link
Contributor Author

liambrady commented May 26, 2025

As discussed elsewhere, we need to use a new async_spawn() variant for this to preserve the existing non-asyncio API. The use case you're interested in is coming from an async function so should be fine to just switch to the new variant inside console/shell_spawn.

I modified both console/shell_spawn and launch/monitor to use the new async variant (both methods are already async and so should use the variant), and now spawn() appears to have fallen out of the code coverage. It appears that commander.spawn() is not called anywhere else within the codebase.

Does this affect the decision to continue to maintain commander.spawn()? Do we need it for future additions to the munet API? It doesn't feel correct to leave dead code within the project (or write new tests to maintain a section of code that isn't being meaningfully used.)

@@ -727,6 +727,151 @@ def spawn(
p.close()
raise error from eoferr

async def async_spawn(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to create this new function you did a large copy and paste. That's a good indication we need to refactor the function so that more code is shared. :)

That is hard to do sometimes with async/await. I feel like there should be some way to share code better between asyncio and non-asyncio, but I haven't figured it out yet. This is why you see other places in the code where I run the async function inside an asyncio.run() call for the non-asyncio variant. I'm not sure how safe this is though b/c sometimes people aren't using asyncio b/c it's broken for them in which case running it with asyncio.run() will be broken too.

A good place to test this would be in an FRR topotest which uses spawn.

Copy link
Contributor Author

@liambrady liambrady Jun 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite like the asyncio.run() solution for two reasons:

  1. The synchronous version of spawn() cannot be used from within an asynchronous method since asyncio does not support nested event loops to the best of my knowledge. I am not confident enough to claim that there will never be a use case where spawn() inside an async method is appropriate (especially since spawn() is located so deep within the API). As such, an asyncio.run() solution does not feel safe.
  2. Even if we followed the asyncio.run() solution, then we have to work around the async and use_pty=False hack (i.e. to avoid the exception with p.expect(..., async_=True) and pipes). Specifically, we only want to use the hack when absolutely necessary. However, using asyncio.run() as the solution would imply that we end up using the hack during the synchronous variant of spawn(), which is not desired.

The only other real option then in refactoring is to try and split off as much synchronous code as possible and place it into helper methods. I don't find such a solution satisfying, since that would leave a large portion of the methods untouched, but don't see any other real alternatives.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the latest commit for a minor refactor that collects code that is synchronous in either version (sync/async) into a single method to reduce duplicate code.

I am struggling to determine if there are any other changes that I could make to reduce code duplication. The presence of the await statement within the try/except block + while block make it difficult to reduce the logic any further imo.

Copy link
Collaborator

@choppsv1 choppsv1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to test both async and non-async variants, and the vty and non-vty case.

tests: cover error cases, both sync/async

This commit introduces a minor refactor that collects
sync code used by both sync/async spawn() into a
single callable method. The majority of sync/async
spawn() is untouched due to the fundamental difference
that asyncio brings with it to the send/expect loop.

New tests are introduced such that both sync/async
spawn() methods are tested separately. Movever,
the error cases are also now tested.

Signed-off-by: Liam Brady <[email protected]>
sends[index - 1],
)
if sends[index - 1]:
p.send(sends[index - 1])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The majority of synchronous spawn() is tested, but not all of the duplicate code. I was unable to determine a solution that allowed the send/expect logic to be tested while ensuring that we also test functionality across all the different shells.

@liambrady liambrady requested a review from choppsv1 June 25, 2025 18:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants