Skip to content

[tmpnet] Enable externally accessible URIs for kube-hosted nodes #4016

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jul 21, 2025

Conversation

maru-ava
Copy link
Contributor

@maru-ava maru-ava commented Jun 17, 2025

Why this should be merged

Previously, access from outside of a kube cluster was enabled by port-forwarding through the kube API. This approach turned out incompatible with load testing because it greatly limited the rate at which transactions could be sent.

The port-forwarding is replaced with nginx+ingress to ensure minimum overhead when running outside of a kube cluster.

How this works

  • deploy kind cluster with a static NodePort
  • deploy the nginx ingress operator with helm and configure to use the NodePort
  • update kube runtime to create a service and ingress for each node to configure external accessibility via nginx
  • replace GetLocalURI with simplified GetAccessibleURI
  • enable the configuration of ingress host and secret via the tmpnet defaults configmap

How this was tested

CI

Need to be documented in RELEASES.md?

N/A

TODO

@maru-ava maru-ava self-assigned this Jun 17, 2025
@Copilot Copilot AI review requested due to automatic review settings June 17, 2025 07:34
@maru-ava maru-ava added testing This primarily focuses on testing tooling Build, test and development tooling labels Jun 17, 2025
@maru-ava maru-ava marked this pull request as draft June 17, 2025 07:35
@maru-ava maru-ava moved this to In Progress 🏗️ in avalanchego Jun 17, 2025
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR replaces port-forwarding with an externally accessible URI approach for kube-hosted nodes in order to improve load testing performance. Key changes include refactoring functions from GetLocalURI to GetAccessibleURI (and staking address equivalents), adding new logic to create Kubernetes Service and Ingress resources, and updating tests to use the new accessible URI methods.

Reviewed Changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/fixture/tmpnet/utils.go Updated error handling in CheckNodeHealth and simplified Node URI retrieval
tests/fixture/tmpnet/start_kind_cluster.go Added functions to deploy and verify the nginx ingress controller via Helm
tests/fixture/tmpnet/process_runtime.go Replaced GetLocalURI with GetAccessibleURI and updated staking address method
tests/fixture/tmpnet/node.go Updated Node interface methods to use the accessible URI versions
tests/fixture/tmpnet/network.go Updated network URI retrieval to use GetAccessibleURI
tests/fixture/tmpnet/kube_runtime.go Refactored URI retrieval for nodes to support external access and added Service/Ingress
tests/fixture/tmpnet/flags/kube_runtime.go Introduced new flag for base accessible URI configuration
tests/e2e/* Updated tests to use GetAccessibleURI (and staking address) throughout
scripts/kind-with-registry.sh Added a new script to set up kind with a local registry
flake.nix Removed kind-with-registry derivation and updated PATH configuration

@maru-ava maru-ava moved this from In Progress 🏗️ to Ready 🚦 in avalanchego Jun 25, 2025
@maru-ava maru-ava marked this pull request as ready for review June 25, 2025 03:45
@maru-ava maru-ava force-pushed the tmpnet-nginx-ingress branch from 3814de2 to 30501bc Compare June 25, 2025 03:46
Copy link
Contributor

@RodrigoVillar RodrigoVillar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ran this locally with the load test passing.

I tried using this against the remote cluster with no success (this is expected). However, I believe you mentioned that this PR should refuse deployment to a context other than kind-kind. Do you think we should add the refuse logic to this PR?

@maru-ava
Copy link
Contributor Author

maru-ava commented Jun 26, 2025

Ran this locally with the load test passing.

I tried using this against the remote cluster with no success (this is expected). However, I believe you mentioned that this PR should refuse deployment to a context other than kind-kind. Do you think we should add the refuse logic to this PR?

I meant to say that the test should only target kind-kind, not that tmpnet should refuse to deploy to a context other than kind-kind.

Was the failure something like failed to wait for Ingress...? It may make sense to log a warning if the ingress status suggests that the reason is the lack of an ingress controller.

@RodrigoVillar
Copy link
Contributor

@maru-ava I'm getting the following error:

[06-26|09:29:52.291] ERROR avalanchego-load-test tests/simple_test_context.go:33
        Error Trace:    /Users/rodrigo.villar/go/src/github.com/ava-labs/avalanchego/tests/fixture/e2e/helpers.go:304
                                                /Users/rodrigo.villar/go/src/github.com/ava-labs/avalanchego/tests/fixture/e2e/env.go:185
                                                /Users/rodrigo.villar/go/src/github.com/ava-labs/avalanchego/tests/load/c/main/main.go:63
                                                /Users/rodrigo.villar/.goenv/versions/1.23.9/src/runtime/proc.go:272
                                                /Users/rodrigo.villar/.goenv/versions/1.23.9/src/runtime/asm_arm64.s:1223
        Error:          Received unexpected error:
                        failed to query node health: failed to issue request: Post "http://localhost:30791/networks/cb4ff442-bea0-401b-b5b5-5d7c38fc1e9c/NodeID-PWaVaDZqkNCfsYBtdCsXBSBkdas6Uj3Fu/ext/health": dial tcp 127.0.0.1:30791: connect: connection refused
        Messages:       failed to bootstrap network

Going back to the original question, logging a warning would be great here.

@maru-ava maru-ava force-pushed the tmpnet-nginx-ingress branch from 30501bc to c2453bb Compare June 26, 2025 14:28
@maru-ava
Copy link
Contributor Author

Rebased

@maru-ava maru-ava force-pushed the tmpnet-nginx-ingress branch 4 times, most recently from 0d49b43 to 407219d Compare June 27, 2025 01:47
@maru-ava
Copy link
Contributor Author

maru-ava commented Jun 27, 2025

Going back to the original question, logging a warning would be great here.

I added more checks, but when I was testing this, I realized that our test cluster actually has an ingress controller so the checks didn't actually help.

Instead, I've changed how the base accessible URI is configured so that it is no longer defaulted to the kind-kind value. Now if you try to deploy outside a kube cluster and the target cluster is not the kind cluster we deploy, it will complain that --base-accessible-uri must be provided and exit without deploying anything.

Edit: Updated to source the ingress configuration from a configmap in the cluster. That way it can be zero-config for the end-user.

@maru-ava maru-ava force-pushed the tmpnet-nginx-ingress branch 2 times, most recently from 30e1313 to 6a0c887 Compare July 1, 2025 03:30
@maru-ava maru-ava force-pushed the tmpnet-nginx-ingress branch from 8869e8a to fec55a8 Compare July 3, 2025 00:51
@maru-ava
Copy link
Contributor Author

maru-ava commented Jul 3, 2025

Dropped basicauth protection - the guid and node id should be sufficient.

@maru-ava maru-ava force-pushed the tmpnet-nginx-ingress branch 2 times, most recently from 2a0ca73 to 51ac8dc Compare July 8, 2025 04:43
@maru-ava maru-ava changed the base branch from master to tmpnet-configmap July 8, 2025 04:43
@maru-ava maru-ava marked this pull request as draft July 8, 2025 04:43
@maru-ava maru-ava force-pushed the tmpnet-nginx-ingress branch 3 times, most recently from 0db8da1 to e2231cd Compare July 8, 2025 05:49
@maru-ava maru-ava force-pushed the tmpnet-nginx-ingress branch from e2231cd to 7991525 Compare July 18, 2025 05:18
Base automatically changed from tmpnet-configmap to master July 18, 2025 15:15
@maru-ava maru-ava force-pushed the tmpnet-nginx-ingress branch from 7991525 to b9f3df7 Compare July 18, 2025 16:41
@maru-ava maru-ava marked this pull request as ready for review July 18, 2025 16:41
Copy link
Contributor

@Elvis339 Elvis339 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kind of transaction rates are we hitting the wall with using port-forwarding? And what are we seeing now with nginx ingress? Just curious about the actual numbers that drove this change.

I see from @RodrigoVillar's testing that this doesn't work against remote clusters. Makes sense since it's tied to the kind setup. But what would it take to make this work for remote clusters too? Could we deploy a similar nginx ingress setup to a remote cluster and reuse the same accessible URI logic, or are there other blockers?

Trying to understand the scope and whether this could be extended beyond kind.

maru-ava added 7 commits July 21, 2025 14:43
Previously, access from outside of a kube cluster was enabled by
port-forwarding through the kube API. This approach turned out
incompatible with load testing because it greatly limited the rate at
which transactions could be sent.

The port-forwarding is replaced with nginx+ingress to ensure minimum
overhead when running outside of a kube cluster.
@maru-ava
Copy link
Contributor Author

What kind of transaction rates are we hitting the wall with using port-forwarding? And what are we seeing now with nginx ingress? Just curious about the actual numbers that drove this change.

My understanding is that port-forwarding through the kube API limited throughput below what would be seen for a process-based network, and that with nginx removed that limitation.

I see from @RodrigoVillar's testing that this doesn't work against remote clusters. Makes sense since it's tied to the kind setup. But what would it take to make this work for remote clusters too? Could we deploy a similar nginx ingress setup to a remote cluster and reuse the same accessible URI logic, or are there other blockers?

Rodrigo's testing is outdated. This works for any cluster that a) has an nginx ingress controller deployed b) provides the tmpnet-defaults configmap that provides the necessary ingress configuration.

@maru-ava maru-ava added this pull request to the merge queue Jul 21, 2025
Merged via the queue into master with commit 28d5491 Jul 21, 2025
29 checks passed
@maru-ava maru-ava deleted the tmpnet-nginx-ingress branch July 21, 2025 16:24
@github-project-automation github-project-automation bot moved this from Ready 🚦 to Done 🎉 in avalanchego Jul 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
testing This primarily focuses on testing tooling Build, test and development tooling
Projects
Status: Done 🎉
Development

Successfully merging this pull request may close these issues.

3 participants