munet: add Commander.async_spawn() #58

liambrady · 2025-05-21T16:39:28Z

spawn(), which conducts a send/expect process is
blocking due to the abscence of async_=True to
pexpect.expect() within the method's main
send/expect loop. Some async methods call
spawn(), however, which leads to such methods
being blocked.

This is particularly problematic in cases where
multiple VMs requiring custom console expect/send
prompts are required (e.g. routers). Since each
QEMU VM sets up both console logging and completes
an expect/send loop without yielding control to
the coroutine of the other QEMU VMs, only one VM
can ever have logging configured (and thus advance in
an expect/send loop) at any given time. Not
only is this inefficient given that all QEMU VMs
are running and waiting for input, but this can
be fatal if console logging is not set up on each
VM in time. If some mandatory console output is
missed, then there is no way to recover and the
setup of the QEMU VM will time out.

However, setting async_=True in pexpect.expect()
leads to errors when mixed with PopenSpawn.
To bypass this issue, we do not use pexpect.expect()
with async_=True, but repeatedly call
pexpect.expect() with a timeout set at 0.1 (Until
a timeout that we track expires).

The end result of this change is that both logging is set
up on all QEMU nodes immediately and that
independent progress can be made in each VM's
expect/send loop simultaneously.

codecov · 2025-05-21T16:51:10Z

Codecov Report

Attention: Patch coverage is 90.14085% with 7 lines in your changes missing coverage. Please review.

Project coverage is 59.28%. Comparing base (4ec7fe7) to head (fc1a560).
Report is 25 commits behind head on main.

Files with missing lines	Patch %	Lines
munet/base.py	89.70%	7 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #58      +/-   ##
==========================================
+ Coverage   58.84%   59.28%   +0.44%     
==========================================
  Files          19       19              
  Lines        5729     5802      +73     
==========================================
+ Hits         3371     3440      +69     
- Misses       2358     2362       +4

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

choppsv1 · 2025-05-22T16:37:07Z

As discussed elsewhere, we need to use a new async_spawn() variant for this to preserve the existing non-asyncio API. The use case you're interested in is coming from an async function so should be fine to just switch to the new variant inside console/shell_spawn.

Also, since async/await only fails with non-pty pexpect.popen_spawn (i.e., pipes) you should restrict the busy loop fix to only that case. For use_pty == True use the supported async from pexpect.

spawn(), which conducts a send/expect process is blocking due to the abscence of async_=True to pexpect.expect() within the method's main send/expect loop. Some async methods call spawn(), however, which leads to such methods being blocked. This is particularly problematic in cases where multiple VMs requiring custom console expect/send prompts are required (e.g. routers). Since each QEMU VM sets up both console logging and completes an expect/send loop without yielding control to the coroutine of the other QEMU VMs, only one VM can ever have logging configured (and thus advance in an expect/send loop) at any given time. Not only is this inefficient given that all QEMU VMs are running and waiting for input, but this can be fatal if console logging is not set up on each VM in time. If some mandatory console output is missed, then there is no way to recover and the setup of the QEMU VM will time out. However, setting async_=True in pexpect.expect() leads to errors when mixed with PopenSpawn. To bypass this issue, we do not use pexpect.expect() with async_=True, but repeatedly call pexpect.expect() with a timeout set at 0.1 (Until a timeout that we track expires). The end result of this change is that both logging is set up on all QEMU nodes immediately and that independent progress can be made in each VM's expect/send loop simultaneously. Signed-off-by: Liam Brady <[email protected]>

liambrady · 2025-05-26T21:39:26Z

As discussed elsewhere, we need to use a new async_spawn() variant for this to preserve the existing non-asyncio API. The use case you're interested in is coming from an async function so should be fine to just switch to the new variant inside console/shell_spawn.

I modified both console/shell_spawn and launch/monitor to use the new async variant (both methods are already async and so should use the variant), and now spawn() appears to have fallen out of the code coverage. It appears that commander.spawn() is not called anywhere else within the codebase.

Does this affect the decision to continue to maintain commander.spawn()? Do we need it for future additions to the munet API? It doesn't feel correct to leave dead code within the project (or write new tests to maintain a section of code that isn't being meaningfully used.)

choppsv1 · 2025-06-05T15:06:12Z

munet/base.py

@@ -727,6 +727,151 @@ def spawn(
            p.close()
            raise error from eoferr

+    async def async_spawn(


In order to create this new function you did a large copy and paste. That's a good indication we need to refactor the function so that more code is shared. :)

That is hard to do sometimes with async/await. I feel like there should be some way to share code better between asyncio and non-asyncio, but I haven't figured it out yet. This is why you see other places in the code where I run the async function inside an asyncio.run() call for the non-asyncio variant. I'm not sure how safe this is though b/c sometimes people aren't using asyncio b/c it's broken for them in which case running it with asyncio.run() will be broken too.

A good place to test this would be in an FRR topotest which uses spawn.

I don't quite like the asyncio.run() solution for two reasons:

The synchronous version of spawn() cannot be used from within an asynchronous method since asyncio does not support nested event loops to the best of my knowledge. I am not confident enough to claim that there will never be a use case where spawn() inside an async method is appropriate (especially since spawn() is located so deep within the API). As such, an asyncio.run() solution does not feel safe.

Even if we followed the asyncio.run() solution, then we have to work around the async and use_pty=False hack (i.e. to avoid the exception with p.expect(..., async_=True) and pipes). Specifically, we only want to use the hack when absolutely necessary. However, using asyncio.run() as the solution would imply that we end up using the hack during the synchronous variant of spawn(), which is not desired.

The only other real option then in refactoring is to try and split off as much synchronous code as possible and place it into helper methods. I don't find such a solution satisfying, since that would leave a large portion of the methods untouched, but don't see any other real alternatives.

See the latest commit for a minor refactor that collects code that is synchronous in either version (sync/async) into a single method to reduce duplicate code.

I am struggling to determine if there are any other changes that I could make to reduce code duplication. The presence of the await statement within the try/except block + while block make it difficult to reduce the logic any further imo.

choppsv1

Need to test both async and non-async variants, and the vty and non-vty case.

munet/base.py

tests: cover error cases, both sync/async This commit introduces a minor refactor that collects sync code used by both sync/async spawn() into a single callable method. The majority of sync/async spawn() is untouched due to the fundamental difference that asyncio brings with it to the send/expect loop. New tests are introduced such that both sync/async spawn() methods are tested separately. Movever, the error cases are also now tested. Signed-off-by: Liam Brady <[email protected]>

liambrady · 2025-06-24T22:21:49Z

munet/base.py

+                        sends[index - 1],
+                    )
+                if sends[index - 1]:
+                    p.send(sends[index - 1])


The majority of synchronous spawn() is tested, but not all of the duplicate code. I was unable to determine a solution that allowed the send/expect logic to be tested while ensuring that we also test functionality across all the different shells.

liambrady added the bug Something isn't working label May 21, 2025

liambrady mentioned this pull request May 21, 2025

DON'T MERGE: Asynchronous shell_spawn() with removed test case #55

Closed

liambrady requested a review from choppsv1 May 21, 2025 16:53

liambrady marked this pull request as ready for review May 21, 2025 16:53

liambrady force-pushed the liambrady/qemu_pexpect_nonblocking branch from e3b5d2c to a848814 Compare May 26, 2025 21:10

liambrady changed the title ~~munet: make Commander.spawn() async~~ munet: add Commander.asynnc_spawn() May 26, 2025

liambrady changed the title ~~munet: add Commander.asynnc_spawn()~~ munet: add Commander.async_spawn() May 26, 2025

choppsv1 reviewed Jun 5, 2025

View reviewed changes

choppsv1 requested changes Jun 11, 2025

View reviewed changes

munet/base.py Outdated Show resolved Hide resolved

munet/base.py Show resolved Hide resolved

munet/base.py Outdated Show resolved Hide resolved

liambrady commented Jun 24, 2025

View reviewed changes

liambrady requested a review from choppsv1 June 25, 2025 18:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

munet: add Commander.async_spawn() #58

munet: add Commander.async_spawn() #58

Uh oh!

liambrady commented May 21, 2025 •

edited

Loading

Uh oh!

codecov bot commented May 21, 2025 •

edited

Loading

Uh oh!

choppsv1 commented May 22, 2025

Uh oh!

liambrady commented May 26, 2025 •

edited

Loading

Uh oh!

choppsv1 Jun 5, 2025

Uh oh!

liambrady Jun 24, 2025 •

edited

Loading

Uh oh!

liambrady Jun 24, 2025

Uh oh!

choppsv1 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

liambrady Jun 24, 2025

Uh oh!

Uh oh!

munet: add Commander.async_spawn() #58

Are you sure you want to change the base?

munet: add Commander.async_spawn() #58

Uh oh!

Conversation

liambrady commented May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

choppsv1 commented May 22, 2025

Uh oh!

liambrady commented May 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

choppsv1 Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

liambrady Jun 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

liambrady Jun 24, 2025

Choose a reason for hiding this comment

Uh oh!

choppsv1 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

liambrady Jun 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

liambrady commented May 21, 2025 •

edited

Loading

codecov bot commented May 21, 2025 •

edited

Loading

liambrady commented May 26, 2025 •

edited

Loading

liambrady Jun 24, 2025 •

edited

Loading