-
Notifications
You must be signed in to change notification settings - Fork 102
ci: add continuous benchmarking #1726
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Ci is unfortunately still failing, despite hermit-os/hermit-rs#708 being merged and also used. |
592e8c2
to
dba2fde
Compare
@jounathaen it's passing now and results look good |
The benchmarks take 27 minutes now, which is 10 mins longer than the tests. Can we change the benchmark setup, so that the CI duration is not dominated by the benchmarks anymore (parallelize it on two runners or reduce the benchmark size)? |
I think the benchmarks taking more time is fine, but I would suggest running the benchmark suite after merging and on PRs only on demand. |
But in that case, we wouldn't know if a PR affects performance before merging it. |
As @mkroening pointed out, we already have gh-pages installed for this repository (https://hermit-os.github.io/kernel/hermit/). We would need to find a way of publishing the results on a different URL. Martin suggested a separate repository for gh-pages, where this action pushes to. Maybe a combination of both gh-pages into subdirectories is also possible. |
Is that absolutely necessary? We could use the same gh-pages branch of this Repo and have benchmarks hosted on https://hermit-os.github.io/kernel/performance/ (or something similar), all it takes is adjusting the folder github-action-benchmark puts the data into Edit: Nevermind, i see the problem.... Another Repo sounds like a good idea. Seperate subdirectories would require the publish_docs to stop force pushing and that may lead to a lot of unecessary history |
By adding the keep_history flag to
I've tested it on my fork https://github.com/CarlWachter/kernel. (See also: example of outputs: https://carlwachter.github.io/kernel/benchmarks/, What do you think of this solution? |
Okay this is a little weird: To run the workflow with permissions to push and comment on PRs with results, the workflow has to use the On that fork the only notable difference is the trigger method and runner (as it does not have a self-hosted runner) and it appears to function as intended. |
.github/workflows/ci.yml
Outdated
@@ -3,7 +3,7 @@ name: CI | |||
on: | |||
pull_request: | |||
merge_group: | |||
|
|||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tiny nitpick: Whitespace added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! :)
I have a few requests. :D
.github/benchmarks/netbench.json
Outdated
}, | ||
{ | ||
"name": "Netbench TCP BW - Client", | ||
"command": "parallel ::: 'cargo run --manifest-path ./Cargo.toml --bin tcp-server-bw --release --target x86_64-unknown-linux-gnu -- --address 10.0.5.3 --bytes 1048576 --rounds 1000' 'sleep 10 && sudo qemu-system-x86_64 -display none -serial stdio -kernel hermit-loader-x86_64 -cpu qemu64,apic,fsgsbase,rdtscp,xsave,xsaveopt,fxsr,rdrand -enable-kvm -initrd target/x86_64-unknown-hermit/release/tcp-client-bw -smp 2 -m 1024M -netdev user,id=u1,hostfwd=tcp::9975-:9975,hostfwd=udp::9975-:9975,net=192.168.76.0/24,dhcpstart=192.168.76.9 -device virtio-net-pci,netdev=u1,disable-legacy=on,packed=on,mq=on -append \"-- --nonblocking --address 127.0.0.1 --bytes 1048576 --rounds 1000\"'", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason for using user networking instead of taps? I get more than 15% more bandwidth on my machine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No particular reason that i can think of, I'd be happy to switch to taps if you tell me the args you used to achieve that performance bump
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should work as described in the loader's README. :)
.github/benchmarks/netbench.json
Outdated
}, | ||
{ | ||
"name": "Netbench TCP Latency - Server", | ||
"command": "parallel ::: 'sleep 10 && cargo run --manifest-path ./Cargo.toml --bin tcp-client-latency --release --target x86_64-unknown-linux-gnu -- --nonblocking --address 127.0.0.1 --bytes 1048576 --rounds 250' 'sudo qemu-system-x86_64 -display none -serial stdio -kernel hermit-loader-x86_64 -cpu qemu64,apic,fsgsbase,rdtscp,xsave,xsaveopt,fxsr,rdrand -enable-kvm -initrd target/x86_64-unknown-hermit/release/tcp-server-latency -smp 2 -m 1024M -netdev user,id=u1,hostfwd=tcp::7878-:7878,hostfwd=udp::9975-:9975,net=192.168.76.0/24,dhcpstart=192.168.76.9 -device virtio-net-pci,netdev=u1,disable-legacy=on,packed=on,mq=on -append \"-- --address 10.0.5.3 --bytes 1048576 --rounds 250\"'", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The forwarded ports for user networking are quite inconsistent right now. The application uses port 7878 by default and is configurable via CLI, but parts of the UDP and latency benches have 9975 hardcoded, which should be fixed.
The QEMU invocations should only forward the correct ports that are necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed the forwarded ports to be minimal now. As for the hardcoding that is in reference to the port that is being connected to, rather than the one that is binded to (which is determined by the CLI arg). Since this is explicitly a benchmark, not an example, i don't see any issue with that being hardcoded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. So you mean that the remote port we connect to is inferred from CLI but the local port that we send the request from is hardcoded, right? In that case, the port should better be chosen by the operating system by specifying 0, or does Hermit not support that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes i've tried that and had no success getting it to bind to port 0, always failing when i attempted that. It works when i run the code like that on a non-hermit system so i assume hermit does not support that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see. I have opened an issue for supporting ephemeral ports (#1753). Could you add a TODO note that the hardcoded value is not relevant and should be changed to zero once possible? I am also wondering if changing 7878 to 9975 as well would be more or less confusing. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't that issue be on hermit-rs
, since that is where the benchmarks with the hardcoded ports are?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the issue is that the kernel does not support that. Sure, user space should be changed once the root issue is resolved.
Hi, just chiming in: The (relatively newer) |
I've addressed most of the issues brought up now, but once again ran into issues with permissions. Namely pushing to the @jounathaen @mkroening How would you feel about a Organisation owned PAT? It would allow the action to push to |
I've cut netbench out for now as i have not been able to test with tap devices and there is a kernel side incompatibility of the current netbench implementation with the kernel anyhow (#1780). Otherwise this PR is ready and I would recommend we merge this today without netbench. I once again tested everything on my fork to ensure: CarlWachter#9 |
This will execute the benchmarks in the hermit-rs repo according to the
bench.json
file. Results will be published to Github Pages.CI will fail until hermit-os/hermit-rs#708 gets merged.