-
Notifications
You must be signed in to change notification settings - Fork 29
[CI] migrate CI test steps to containers #310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
6f35827 to
696827b
Compare
9e69799 to
1e16cae
Compare
|
bot:retest |
dpressle
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If gtest works, we can now enable concurrent builds
9e0436b to
18cb753
Compare
e52a28e to
f660eef
Compare
|
bot:retest |
|
|
||
| function do_hugepages() | ||
| { | ||
| if [[ -f /.dockerenv && ! $(grep -q hugetlbfs /proc/mounts) ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should probably do a more extensive check for containterized environment like you do in gtest script
[[ -f /.dockerenv || -f /run/.containerenv || -n "${KUBERNETES_SERVICE_HOST}" ]]
| #fi | ||
| if [ ! -z "$(do_get_ip 'eth')" ]; then | ||
| test_ip_list="${test_ip_list} eth_ip4:$(do_get_ip 'eth')" | ||
| if [[ -f /.dockerenv ]] || [[ -f /run/.containerenv ]] || [[ -n "${KUBERNETES_SERVICE_HOST}" ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you improve it a little bit and reduce duplication, so instead of build the ip list string in 2 places just have variables for each ip and generate the list in one place
if
ipv4=get_ipv4...
ipv6=get_ipv6...
else
ipv4=
ipv6=
fi
test_ip_list="eth_ip4:${ipv4} eth_ip6:${ipv6}"
| else | ||
| gtest_opt="--addr=$(do_get_addrs 'eth' ${opt2})" | ||
| gtest_opt_ipv6="--addr=$(do_get_addrs 'inet6' ${opt2}) -r fdff:ffff:ffff:ffff:ffff:ffff:ffff:ffff" # Remote - Dummy Address | ||
| fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add check if ips are empty (like we have in all other tests scripts)?
|
/review |
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
|
/improve |
PR Code Suggestions ✨
|
55659e4 to
980e0ef
Compare
Today we use benni09 static agent to run test/gtest/valgrind steps which is unscaleable since it can only run one pipeline at a time, causing delays in builds that can be stuck waiting for hours The idea is to move these steps to containers, allowing running them in parallel as well as running multiple pipelines at the same time (depending on the capacity of the k8s cluster) Issue: HPCINFRA-3249 Signed-off-by: NirWolfer <[email protected]>
The thread-local dummy locker in ring_slave could cause use-after-free issues during XLIO shutdown when one thread attempts to access a socket's locker that was created by a terminated thread. This occurs because the thread-local object is freed when its creator thread terminates. Replace the thread-local dummy locker with a global one to prevent this issue. To maintain data path performance, optimize the dummy lock for a different cache-line to prevent false sharing by aligning the lock on a 64-byte boundary. Signed-off-by: Tomer Cabouly <[email protected]>
This commit fixes a critical race condition in timer management for TCP sockets that was introduced in commit c73d96a. The heap corruption was caused by a race condition between the timer thread and socket destruction. Sockets could be deleted by the event handler thread while still being referenced by the timer thread in the timer collections, resulting in heap corruption when the timer thread attempted to access the deleted memory. In the original implementation, sockets were removed from timer collections and deleted asynchronously without proper synchronization with the timer processing thread. Fix: - Remove sockets from timer collections while still holding the socket lock, guaranteeing the timer thread cannot access sockets marked for deletion - Create a simplified deletion path that doesn't attempt to access timer collections again after socket cleanup Additionally, as an unrelated improvement, this patch fixes a lock leak in the early return path of sockinfo_tcp::clean_socket_obj() where a lock was acquired but not released when a socket was already marked as cleaned. The heap corruption stemmed from a fundamental architectural change that separated socket objects from their timer management without providing proper synchronization for the distributed socket lifecycle. Signed-off-by: Tomer Cabouly <[email protected]>
980e0ef to
78cebc7
Compare
dpressle
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is it possible that i see vg changes didnt we already merged vg in container to vNext
|
no idea, but since we are adding these changes one PR at a time, this PR is no longer relevant. |
Description
Today we use benni09 static agent to run test/gtest/valgrind steps which is unscalable since it can only run one pipeline at a time, causing delays in builds that can be stuck waiting for hours
The idea is to move these steps to containers, allowing running them in parallel as well as running multiple pipelines at the same time (depending on the capacity of the k8s cluster)
What
Change test/gtest/valgrind steps to run on containers on swx-k8s-spray cluster instead of benni09
Why ?
HPCINFRA-3249
Change type
What kind of change does this PR introduce?
Check list