Skip to content

feat: add task for in-cluster load test #4007

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

RodrigoVillar
Copy link
Contributor

@RodrigoVillar RodrigoVillar commented Jun 10, 2025

Why this should be merged

This PR adds a task allowing for running load tests within a local kind cluster.

How this works

The task does the following:

  • Start a kind cluster if one has not been started yet
  • Create an image that runs the load test and deploys it to a local registry
  • Create a pod manifest for the load test and deploys the pod to the kind cluster

How this was tested

The load test was run with task test-load-kind-cluster and passed.

Need to be documented in RELEASES.md?

N/A

@RodrigoVillar RodrigoVillar self-assigned this Jun 10, 2025
@RodrigoVillar RodrigoVillar added the testing This primarily focuses on testing label Jun 10, 2025
Base automatically changed from in-cluster-test-fix to master June 11, 2025 11:08
@RodrigoVillar RodrigoVillar force-pushed the add-kind-cluster-task branch from e53c1ad to 8c66465 Compare June 11, 2025 12:14
@RodrigoVillar RodrigoVillar marked this pull request as ready for review June 11, 2025 12:45
@RodrigoVillar RodrigoVillar requested a review from maru-ava as a code owner June 11, 2025 12:45
@RodrigoVillar RodrigoVillar requested a review from Elvis339 June 11, 2025 12:45
Taskfile.yml Outdated
@@ -244,6 +244,10 @@ tasks:
cmds:
- cmd: go run ./tests/load/c/main --runtime=kube --kube-use-exclusive-scheduling {{.CLI_ARGS}}

test-load-kind-cluster:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe choose a name more reflective of the test running inside the cluster? 'kind cluster' isn't very specific.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: d9fdb87

@@ -10,6 +10,7 @@ set -euo pipefail
# DOCKER_IMAGE=avaplatform/avalanchego ./scripts/build_image.sh # Build and push multi-arch image to docker hub
# DOCKER_IMAGE=localhost:5001/avalanchego ./scripts/build_image.sh # Build and push multi-arch image to private registry
# DOCKER_IMAGE=localhost:5001/avalanchego FORCE_TAG_LATEST=1 ./scripts/build_image.sh # Build and push image to private registry with tag `latest`
# DOCKERFILE="./Dockerfile" ./scripts/build_image.sh # Build image with a custom Dockerfile
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(No action required) Why is it desirable to customize this build script instead of following the example of scripts/build_bootstrap_monitor_image.sh?

Note that a compiled binary is suggested rather than using 'go run' at runtime.

fi

# Start kind cluster
./scripts/start_kind_cluster.sh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest passing arguments as per the example of other kind-using scripts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: 387e4ef

metadata:
name: load-test
namespace: tmpnet
rules:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(No action required) How did you arrive at these permissions?

@@ -101,5 +101,9 @@ func (s *MetricsServer) GenerateMonitoringConfig(monitoringLabels map[string]str
return "", err
}

if err := os.MkdirAll(filepath.Dir(collectorFilePath), 0o755); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this addition might avoid an error, the fact that the path does not exist is a symptom of a larger problem: collection is not being configured in the pod. Given the requirement to label test workload metrics with network uuid, which isn't known at the time of pod deployment, I think deployment of local prometheus collector would be suggested so that tmpnet configure it. That would mean setting the collector credentials to the pod - easy enough - but also ensuring the availability of a compatible version of prometheus so that tmpnet could start it.

Maybe coordinate with Elvis to see what the timeline is for getting ARC online? Other than as a learning exercise, I'm less convinced of the wisdom of supporting pod-based workloads if it requires not just publishing an image and that image being complex to build. CI-launched tests won't need to publish images, and don't need extra work to support workload monitoring. Local iteration would likely be easier to support via enabling external access to nodes via a proxy instead of forwarding.

Copy link

This PR has become stale because it has been open for 30 days with no activity. Adding the lifecycle/frozen label will cause this PR to ignore lifecycle events.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/stale testing This primarily focuses on testing
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

2 participants