Add support for running E2E tests with remote Armada server#83
Add support for running E2E tests with remote Armada server#83
Conversation
These changes are to allow running an armada-spark client against a remote (i.e. non-localhost) Armada cluster server. Signed-off-by: Rich Scott <richscott@sent.com>
Also, fixes in TestOrchestrator for running against a remote Armada instance, and run the tests directly, instead of using a Docker container on the client. Signed-off-by: Rich Scott <richscott@sent.com>
It will soon be used by ArmadaClientApplication. Signed-off-by: Rich Scott <richscott@sent.com>
Signed-off-by: Rich Scott <richscott@sent.com>
…artup tasks Signed-off-by: Rich Scott <richscott@sent.com>
…rver. Signed-off-by: Rich Scott <richscott@sent.com>
| of `kubectl config view`. These files can be left in this directory. | ||
|
|
There was a problem hiding this comment.
Should e2e/*crt be in .gitignore?
There was a problem hiding this comment.
I expect that most times that the certs will be run is either when someone does testing with an external Armada cluster that is not just a Kind-based cluster that the dev/user set up on their own, so they have to get the certs from the Armada cluster administrator, and may want to track those in their armada-spark repo (at least while testing) for safekeeping/diff-tracking purposes. Probably 95+% of the time that e2e will be run will be with a local Kind-based cluster started up via the armada-operator, so these won't be generated anyway.
| Seq(armadactlCmd) ++ subCommand.split(" ") ++ Seq("--armadaUrl", armadaUrl) | ||
| // armadactl command expects the server address to be of the form | ||
| // <hostname-or-IP>:<port> with no pseudo-protocol prefix | ||
| var armadactlUrl = armadaUrl.replaceFirst("^armada://", "") |
There was a problem hiding this comment.
This code has changed substantially, and now is:
212 private def buildCommand(subCommand: String): Seq[String] = {
213 val armadactlCmd = resolveArmadactlPath.getOrElse {
214 throw new RuntimeException("armadactl not found in system properties or PATH")
215 }
216 // armadactl command expects the server address to be of the form
217 // <hostname-or-IP>:<port> with no pseudo-protocol prefix
218 val pattern = """.*armada://(.+)""".r
219 var armadactlUrl = "undefined-armadactl-url"
220
221 armadaUrl match {
222 case pattern(hostPort) => armadactlUrl = hostPort // e.g. "localhost:30002"
223 case _ =>
224 throw new RuntimeException(
225 s"could not extract valid armadactl URL from armada URL ${armadaUrl}"
226 )
227 }
228
229 Seq(armadactlCmd) ++ subCommand.split(" ") ++ Seq("--armadaUrl", armadactlUrl)
230 }
I declare armadactlUrl with var and set its initial value fo "undefined-armada-url", which is overwritten in a successful pattern match, so the default value will make apparent to the user that something did not get set correctly.
There was a problem hiding this comment.
i don't think the user will ever see the default value, "undefined-armadactl-url" after the pattern match. It will either match or throw.
so i think getting rid of the var and writing the val like so is more idiomatic:
val armadactlUrl = armadaUrl match {
case pattern(hostPort) => hostPort
There was a problem hiding this comment.
👍🏼 Code now changed to use this strategy in commit c5c02d5.
|
Running with: no longer seems to work: |
Signed-off-by: Rich Scott <richscott@sent.com>
Signed-off-by: Rich Scott <richscott@sent.com>
Signed-off-by: Rich Scott <richscott@sent.com>
Signed-off-by: Rich Scott <richscott@sent.com>
Signed-off-by: Rich Scott <richscott@sent.com>
Signed-off-by: Rich Scott <richscott@sent.com>
Get the first external interface IP address and use in Kind configuration for allowing external K8S/Armada API access. Add protective quotes around TLS vars in dev-e2e.sh, per shellcheck. Use `realpath` for getting reliable full pathnames. Signed-off-by: Rich Scott <richscott@sent.com>
Use better pattern checks for verifying if Armada master is localhost; remove quote wrapper around tls_args, so Maven doesn't error on a "" target. Signed-off-by: Rich Scott <richscott@sent.com>
Signed-off-by: Rich Scott <richscott@sent.com>
Signed-off-by: Rich Scott <richscott@sent.com>
Signed-off-by: Rich Scott <richscott@sent.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 17 out of 17 changed files in this pull request and generated 9 comments.
Comments suppressed due to low confidence (1)
scripts/dev-e2e.sh:198
- When
armadactl get queuesfails with "connection refused", the script unconditionally callsstart-armada, even ifARMADA_MASTERpoints at a remote server. This contradicts the intended remote-server mode and can cause unexpected local cluster creation. Guard this branch sostart-armadais only attempted when using a local/localhost Armada target; otherwise fail fast with a clear message.
echo "Checking to see if Armada cluster is available ..."
if ! "$scripts"/armadactl get queues > "$STATUSFILE" 2>&1 ; then
if grep -q 'connection refused' "$STATUSFILE"; then
start-armada 2>&1 | log_group "Using armada-operator to start Armada; this may take up to 5 minutes"
sleep 10
armadactl-retry get queues
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: Rich Scott <richscott@sent.com>
Signed-off-by: Rich Scott <richscott@sent.com>
sudiptob2
left a comment
There was a problem hiding this comment.
Thanks for addressing the previous reviews. Looking good. I gave it a re-review, and here are a few small comments I have.
Signed-off-by: Rich Scott <richscott@sent.com>
Signed-off-by: Rich Scott <richscott@sent.com>
Signed-off-by: Rich Scott <richscott@sent.com>
Signed-off-by: Rich Scott <richscott@sent.com>
Signed-off-by: Rich Scott <richscott@sent.com>
Signed-off-by: Rich Scott <richscott@sent.com>
Signed-off-by: Rich Scott <richscott@sent.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 17 out of 17 changed files in this pull request and generated 11 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| -Darmada.master="$ARMADA_MASTER" \ | ||
| -Darmada.lookout.url="$ARMADA_LOOKOUT_URL" \ | ||
| -Darmadactl.path="$scripts/armadactl" \ | ||
| ${tls_args[@]:-} 2>&1 | tee e2e-test.log |
There was a problem hiding this comment.
tls_args expansion is unquoted (${tls_args[@]:-}), so file paths containing spaces will be split into multiple arguments. Use quoted array expansion when passing these through to Maven (e.g., "${tls_args[@]}") to preserve argument boundaries.
| ${tls_args[@]:-} 2>&1 | tee e2e-test.log | |
| "${tls_args[@]}" 2>&1 | tee e2e-test.log |
| CONTEXT=$(kubectl config current-context) | ||
|
|
||
| E2E_DIR=$(realpath "$0" | xargs dirname) | ||
|
|
||
| cd "$E2E_DIR" || (echo "Error: could not cd to $E2E_DIR"; exit 1) | ||
|
|
||
| # What These Files Are | ||
| # - client.crt: Your user (client) certificate | ||
| # - client.key: The private key associated with the certificate | ||
| # - ca.crt: The CA certificate used by the Kubernetes API server (for verifying client and server certs) | ||
|
|
||
| # Extract the client certificate | ||
| kubectl config view --raw -o json | jq -r \ | ||
| ".users[] | select(.name == \"${CONTEXT}\") | .user.[\"client-certificate-data\"]" | base64 -d > client.crt | ||
|
|
||
| # Extract the client key | ||
| kubectl config view --raw -o json | jq -r \ | ||
| ".users[] | select(.name == \"${CONTEXT}\") | .user.[\"client-key-data\"]" | base64 -d > client.key | ||
|
|
||
| # Extract the cluster CA certificate | ||
| kubectl config view --raw -o json | jq -r \ | ||
| ".clusters[] | select(.name == \"${CONTEXT}\") | .cluster.[\"certificate-authority-data\"]" | base64 -d > ca.crt |
There was a problem hiding this comment.
This script assumes the current context name equals both the kubeconfig user name and cluster name, but in many kubeconfigs the context name differs from .users[].name and .clusters[].name. As written it may extract empty values or the wrong credentials. Consider resolving the context entry first, then using its referenced .context.user and .context.cluster names to fetch the right user/cluster data. Also add set -euo pipefail (and explicit checks for jq/base64) so failures don’t silently produce empty/invalid cert files.
| CONTEXT=$(kubectl config current-context) | |
| E2E_DIR=$(realpath "$0" | xargs dirname) | |
| cd "$E2E_DIR" || (echo "Error: could not cd to $E2E_DIR"; exit 1) | |
| # What These Files Are | |
| # - client.crt: Your user (client) certificate | |
| # - client.key: The private key associated with the certificate | |
| # - ca.crt: The CA certificate used by the Kubernetes API server (for verifying client and server certs) | |
| # Extract the client certificate | |
| kubectl config view --raw -o json | jq -r \ | |
| ".users[] | select(.name == \"${CONTEXT}\") | .user.[\"client-certificate-data\"]" | base64 -d > client.crt | |
| # Extract the client key | |
| kubectl config view --raw -o json | jq -r \ | |
| ".users[] | select(.name == \"${CONTEXT}\") | .user.[\"client-key-data\"]" | base64 -d > client.key | |
| # Extract the cluster CA certificate | |
| kubectl config view --raw -o json | jq -r \ | |
| ".clusters[] | select(.name == \"${CONTEXT}\") | .cluster.[\"certificate-authority-data\"]" | base64 -d > ca.crt | |
| set -euo pipefail | |
| command -v kubectl >/dev/null 2>&1 || { echo "Error: kubectl is required but not installed or not in PATH"; exit 1; } | |
| command -v jq >/dev/null 2>&1 || { echo "Error: jq is required but not installed or not in PATH"; exit 1; } | |
| command -v base64 >/dev/null 2>&1 || { echo "Error: base64 is required but not installed or not in PATH"; exit 1; } | |
| CONTEXT=$(kubectl config current-context) | |
| E2E_DIR=$(realpath "$0" | xargs dirname) | |
| cd "$E2E_DIR" || (echo "Error: could not cd to $E2E_DIR"; exit 1) | |
| KUBECONFIG_JSON=$(kubectl config view --raw -o json) | |
| USER_NAME=$(printf '%s\n' "$KUBECONFIG_JSON" | jq -r --arg context "$CONTEXT" '.contexts[] | select(.name == $context) | .context.user') | |
| CLUSTER_NAME=$(printf '%s\n' "$KUBECONFIG_JSON" | jq -r --arg context "$CONTEXT" '.contexts[] | select(.name == $context) | .context.cluster') | |
| if [ -z "$USER_NAME" ] || [ "$USER_NAME" = "null" ]; then | |
| echo "Error: could not resolve user for context $CONTEXT" | |
| exit 1 | |
| fi | |
| if [ -z "$CLUSTER_NAME" ] || [ "$CLUSTER_NAME" = "null" ]; then | |
| echo "Error: could not resolve cluster for context $CONTEXT" | |
| exit 1 | |
| fi | |
| # What These Files Are | |
| # - client.crt: Your user (client) certificate | |
| # - client.key: The private key associated with the certificate | |
| # - ca.crt: The CA certificate used by the Kubernetes API server (for verifying client and server certs) | |
| # Extract the client certificate | |
| printf '%s\n' "$KUBECONFIG_JSON" | jq -r --arg user "$USER_NAME" \ | |
| '.users[] | select(.name == $user) | .user["client-certificate-data"]' | base64 -d > client.crt | |
| # Extract the client key | |
| printf '%s\n' "$KUBECONFIG_JSON" | jq -r --arg user "$USER_NAME" \ | |
| '.users[] | select(.name == $user) | .user["client-key-data"]' | base64 -d > client.key | |
| # Extract the cluster CA certificate | |
| printf '%s\n' "$KUBECONFIG_JSON" | jq -r --arg cluster "$CLUSTER_NAME" \ | |
| '.clusters[] | select(.name == $cluster) | .cluster["certificate-authority-data"]' | base64 -d > ca.crt |
| val props = loadProperties() | ||
|
|
||
| val armadaApiUrl = props.getProperty("armada.master", "armada://localhost:30002") | ||
| armadaClient = new ArmadaClient(armadaApiUrl) | ||
| k8sClient = new K8sClient(props) | ||
| orch = new TestOrchestrator(armadaClient, k8sClient) |
There was a problem hiding this comment.
loadProperties() does not propagate the TLS-related system properties/env vars (e.g., client_cert_file, client_key_file, cluster_ca_file). As a result, new K8sClient(props) will always see empty values and never enable TLS even when scripts/dev-e2e.sh passes -Dclient_cert_file=... etc. Add these properties to loadProperties() (and map from env vars like CLIENT_CERT_FILE), or have K8sClient read from sys.props/sys.env directly when not present in props.
| val cf = CertificateFactory.getInstance("X.509") | ||
| val cert = cf.generateCertificate(new FileInputStream(clientCertFile)) | ||
| algo = cert.getPublicKey.getAlgorithm // "RSA" or "EC" | ||
| } |
There was a problem hiding this comment.
The FileInputStream used to read the client certificate isn’t closed, which can leak file descriptors on repeated test runs. Use a managed resource (e.g., scala.util.Using) or ensure the stream is closed in a finally block.
| val clientCertFile: String = props.getProperty("client_cert_file", "") | ||
| val clientKeyFile: String = props.getProperty("client_key_file", "") | ||
| val clusterCaFile: String = props.getProperty("cluster_ca_file", "") | ||
| var algo: String = "" | ||
|
|
||
| var cb: ConfigBuilder = new ConfigBuilder() | ||
| .withMasterUrl(k8sApiURL) | ||
| .withNamespace("default") | ||
|
|
||
| if (clusterCaFile.nonEmpty) { | ||
| cb = cb.withCaCertFile(clusterCaFile) | ||
| } | ||
|
|
||
| if (clientCertFile.nonEmpty) { | ||
| cb = cb.withClientCertFile(clientCertFile) | ||
|
|
||
| val cf = CertificateFactory.getInstance("X.509") | ||
| val cert = cf.generateCertificate(new FileInputStream(clientCertFile)) | ||
| algo = cert.getPublicKey.getAlgorithm // "RSA" or "EC" | ||
| } | ||
|
|
||
| if (clientKeyFile.nonEmpty) { | ||
| cb = cb.withClientKeyFile(clientKeyFile) | ||
| } | ||
|
|
||
| val cfg = | ||
| if (clientCertFile.nonEmpty) cb.withClientKeyAlgo(algo).build() else cb.build() | ||
|
|
There was a problem hiding this comment.
TLS configuration allows partial input (e.g., client_key_file set but client_cert_file empty). This can create a KubernetesClient configuration that fails in non-obvious ways. Consider validating that cert+key are provided together (and that referenced files exist/readable) and fail fast with a clear error message.
| echo "ERROR: could not determine external network interface address. In order to run Spark" | ||
| echo "applications in client deployment mode, you may need to set SPARK_LOCAL_IP in your" | ||
| echo "scripts/config.sh to the external IP address of this system, for example:" | ||
| echo " export SPARK_LOCAL_IP=\"192.168.1.555\"" |
There was a problem hiding this comment.
The example IP 192.168.1.555 is not a valid IPv4 address (octets must be 0-255). Update it to a valid example value to avoid confusion.
| echo " export SPARK_LOCAL_IP=\"192.168.1.555\"" | |
| echo " export SPARK_LOCAL_IP=\"192.168.1.55\"" |
| IMG_REGEX='^[[:alnum:]_./-]+:[[:alnum:]_-]+$' | ||
|
|
||
| if ! (echo "$IMAGE_NAME" | grep -Eq "$IMG_REGEX"); then | ||
| err "IMAGE_NAME is not defined. Please set it in $scripts/config.sh, for example:" | ||
| err "IMAGE_NAME=spark:testing" | ||
| exit 1 | ||
| fi | ||
|
|
||
| if ! (echo "$INIT_CONTAINER_IMAGE" | grep -Eq "$IMG_REGEX"); then | ||
| err "INIT_CONTAINER_IMAGE is not defined. Please set it in $scripts/config.sh, for example:" | ||
| err "INIT_CONTAINER_IMAGE=busybox:latest" | ||
| exit 1 | ||
| fi |
There was a problem hiding this comment.
IMG_REGEX rejects many valid Docker image references (e.g., tags with dots like busybox:1.36.1, registries with ports like localhost:5000/img:tag, digests, etc.). This will cause false negatives for legitimate configurations. Consider relaxing the regex (at least allow . in the tag and : in the registry part) or use Docker’s own reference parsing instead of a custom regex.
| if [[ ! -d ".spark-$SPARK_VERSION" ]]; then | ||
| echo "Checking out Spark sources for tag v$SPARK_VERSION." | ||
| git clone https://github.com/apache/spark --branch v$SPARK_VERSION --depth 1 --no-tags ".spark-$SPARK_VERSION" | ||
| fi | ||
|
|
||
| cd ".spark-$SPARK_VERSION" | ||
| # Spark 3.3.4 does not compile without this fix | ||
| if [[ "$SPARK_VERSION" == "3.3.4" ]]; then | ||
| "${sed_inplace[@]}" -e "s%<scala.version>2.13.8</scala.version>%<scala.version>2.13.6</scala.version>%" pom.xml | ||
| # Fix deprecated openjdk base image - use eclipse-temurin:11-jammy instead. | ||
| spark_dockerfile="resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile" | ||
| if [ -f "$spark_dockerfile" ]; then | ||
| "${sed_inplace[@]}" -e 's|FROM openjdk:|FROM eclipse-temurin:|g' "$spark_dockerfile" | ||
| "${sed_inplace[@]}" -E 's/^ARG java_image_tag=11-jre-slim$/ARG java_image_tag=11-jammy/' "$spark_dockerfile" | ||
| fi | ||
| fi | ||
| ./dev/change-scala-version.sh $SCALA_BIN_VERSION | ||
| # by packaging the assembly project specifically, jars of all depending Spark projects are fetch from Maven | ||
| # spark-examples jars are not released, so we need to build these from sources | ||
| ./build/mvn --batch-mode clean | ||
| ./build/mvn --batch-mode package -pl examples | ||
| ./build/mvn --batch-mode package -Pkubernetes -Pscala-$SCALA_BIN_VERSION -pl assembly | ||
| cd .. |
There was a problem hiding this comment.
run-test() always clones and builds Spark from source, which is very slow and redundant given this script already requires $IMAGE_NAME to exist locally. Consider only building Spark when the Docker image build actually needs it (as scripts/createImage.sh already does), or gate this step behind a flag/condition to avoid unnecessary network + CPU cost on each E2E run.
| if [[ ! -d ".spark-$SPARK_VERSION" ]]; then | |
| echo "Checking out Spark sources for tag v$SPARK_VERSION." | |
| git clone https://github.com/apache/spark --branch v$SPARK_VERSION --depth 1 --no-tags ".spark-$SPARK_VERSION" | |
| fi | |
| cd ".spark-$SPARK_VERSION" | |
| # Spark 3.3.4 does not compile without this fix | |
| if [[ "$SPARK_VERSION" == "3.3.4" ]]; then | |
| "${sed_inplace[@]}" -e "s%<scala.version>2.13.8</scala.version>%<scala.version>2.13.6</scala.version>%" pom.xml | |
| # Fix deprecated openjdk base image - use eclipse-temurin:11-jammy instead. | |
| spark_dockerfile="resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile" | |
| if [ -f "$spark_dockerfile" ]; then | |
| "${sed_inplace[@]}" -e 's|FROM openjdk:|FROM eclipse-temurin:|g' "$spark_dockerfile" | |
| "${sed_inplace[@]}" -E 's/^ARG java_image_tag=11-jre-slim$/ARG java_image_tag=11-jammy/' "$spark_dockerfile" | |
| fi | |
| fi | |
| ./dev/change-scala-version.sh $SCALA_BIN_VERSION | |
| # by packaging the assembly project specifically, jars of all depending Spark projects are fetch from Maven | |
| # spark-examples jars are not released, so we need to build these from sources | |
| ./build/mvn --batch-mode clean | |
| ./build/mvn --batch-mode package -pl examples | |
| ./build/mvn --batch-mode package -Pkubernetes -Pscala-$SCALA_BIN_VERSION -pl assembly | |
| cd .. | |
| if [[ "${BUILD_SPARK_FROM_SOURCE:-false}" == "true" ]]; then | |
| if [[ ! -d ".spark-$SPARK_VERSION" ]]; then | |
| echo "Checking out Spark sources for tag v$SPARK_VERSION." | |
| git clone https://github.com/apache/spark --branch v$SPARK_VERSION --depth 1 --no-tags ".spark-$SPARK_VERSION" | |
| fi | |
| cd ".spark-$SPARK_VERSION" | |
| # Spark 3.3.4 does not compile without this fix | |
| if [[ "$SPARK_VERSION" == "3.3.4" ]]; then | |
| "${sed_inplace[@]}" -e "s%<scala.version>2.13.8</scala.version>%<scala.version>2.13.6</scala.version>%" pom.xml | |
| # Fix deprecated openjdk base image - use eclipse-temurin:11-jammy instead. | |
| spark_dockerfile="resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile" | |
| if [ -f "$spark_dockerfile" ]; then | |
| "${sed_inplace[@]}" -e 's|FROM openjdk:|FROM eclipse-temurin:|g' "$spark_dockerfile" | |
| "${sed_inplace[@]}" -E 's/^ARG java_image_tag=11-jre-slim$/ARG java_image_tag=11-jammy/' "$spark_dockerfile" | |
| fi | |
| fi | |
| ./dev/change-scala-version.sh $SCALA_BIN_VERSION | |
| # by packaging the assembly project specifically, jars of all depending Spark projects are fetch from Maven | |
| # spark-examples jars are not released, so we need to build these from sources | |
| ./build/mvn --batch-mode clean | |
| ./build/mvn --batch-mode package -pl examples | |
| ./build/mvn --batch-mode package -Pkubernetes -Pscala-$SCALA_BIN_VERSION -pl assembly | |
| cd .. | |
| else | |
| log "Skipping Spark source checkout/build; set BUILD_SPARK_FROM_SOURCE=true to enable it." | |
| fi |
| # into the Kind config so that it binds to a valid address instead | ||
| # of the hardcoded placeholder (192.168.12.135). | ||
| external_ip=$(ifconfig -a| grep -w 'inet' | grep -v 'inet 127\.0\.0' | grep -v 'inet 172\.' | awk '{print $2}' | sed -ne '1p') | ||
|
|
||
| if [ -z "$external_ip" ]; then | ||
| err "Unable to find any IP addresses on an external interface on this system; exiting now" | ||
| exit 1 | ||
| fi |
There was a problem hiding this comment.
external_ip discovery excludes all 172.x.x.x addresses. Many real networks use the private 172.16.0.0/12 range, so this can incorrectly fail to find an external IP and abort. Prefer selecting by interface (exclude docker/kind interfaces by name) or use a route-based method (e.g., derive the source IP for a default route) rather than excluding whole RFC1918 subranges.
| fi | ||
|
|
||
| echo "Extracting TLS client certificate files from Kind cluster" | ||
| if ! e2e/extract-kind-cert.sh; then |
There was a problem hiding this comment.
This calls e2e/extract-kind-cert.sh using a relative path, which will fail if scripts/dev-e2e.sh is invoked from a directory other than the repo root. Use an absolute path based on $scripts (e.g., $scripts/../e2e/extract-kind-cert.sh) for consistency with other references in the script.
| if ! e2e/extract-kind-cert.sh; then | |
| if ! "$scripts/../e2e/extract-kind-cert.sh"; then |
| if [[ ! -d ".spark-$SPARK_VERSION" ]]; then | ||
| echo "Checking out Spark sources for tag v$SPARK_VERSION." | ||
| git clone https://github.com/apache/spark --branch v$SPARK_VERSION --depth 1 --no-tags ".spark-$SPARK_VERSION" | ||
| fi | ||
|
|
||
| cd ".spark-$SPARK_VERSION" | ||
| # Spark 3.3.4 does not compile without this fix | ||
| if [[ "$SPARK_VERSION" == "3.3.4" ]]; then | ||
| "${sed_inplace[@]}" -e "s%<scala.version>2.13.8</scala.version>%<scala.version>2.13.6</scala.version>%" pom.xml | ||
| # Fix deprecated openjdk base image - use eclipse-temurin:11-jammy instead. | ||
| spark_dockerfile="resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile" | ||
| if [ -f "$spark_dockerfile" ]; then | ||
| "${sed_inplace[@]}" -e 's|FROM openjdk:|FROM eclipse-temurin:|g' "$spark_dockerfile" | ||
| "${sed_inplace[@]}" -E 's/^ARG java_image_tag=11-jre-slim$/ARG java_image_tag=11-jammy/' "$spark_dockerfile" | ||
| fi | ||
| fi | ||
| ./dev/change-scala-version.sh $SCALA_BIN_VERSION | ||
| # by packaging the assembly project specifically, jars of all depending Spark projects are fetch from Maven | ||
| # spark-examples jars are not released, so we need to build these from sources | ||
| ./build/mvn --batch-mode clean | ||
| ./build/mvn --batch-mode package -pl examples | ||
| ./build/mvn --batch-mode package -Pkubernetes -Pscala-$SCALA_BIN_VERSION -pl assembly | ||
| cd .. | ||
|
|
There was a problem hiding this comment.
The test runs spark-submit via Docker (see TestOrchestrator.scala). The required JARs are already built and baked into the image by createImage.sh.
armada-spark/scripts/createImage.sh
Lines 46 to 52 in c3b2733
Given that, rebuilding or recreating them during run-test appears redundant. It also adds overhead when running e2e tests repeatedly. Let me know if I am missing a scenario where this step is still required?
There was a problem hiding this comment.
Thanks - this redundant chunk removed in commit 6f584cb.
| // armadactl command expects the server address to be of the form | ||
| // <hostname-or-IP>:<port> with no pseudo-protocol prefix | ||
| val armadactlUrl = | ||
| if (armadaUrl.startsWith("armada://")) { |
There was a problem hiding this comment.
stripPrefix is a no-op when the prefix isn't present, so the if is unnecessary:
val armadactlUrl = armadaUrl.stripPrefix("armada://")
Also remove superfluous conditional for using String.stripPrefix(). Signed-off-by: Rich Scott <richscott@sent.com>
Signed-off-by: Rich Scott <richscott@sent.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 16 out of 16 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| val clientCertFile: String = props.getProperty("client_cert_file", "") | ||
| val clientKeyFile: String = props.getProperty("client_key_file", "") | ||
| val clusterCaFile: String = props.getProperty("cluster_ca_file", "") |
| import org.yaml.snakeyaml.Yaml | ||
|
|
||
| import java.io.{FileReader, FileInputStream} | ||
| import java.nio.file.{Paths, Files} | ||
| import java.security.cert.CertificateFactory | ||
| import java.util.concurrent.TimeoutException | ||
| import java.util.Properties | ||
| import scala.jdk.CollectionConverters._ | ||
| import scala.concurrent.duration._ | ||
| import scala.concurrent.{ExecutionContext, Future} | ||
|
|
||
| /** Kubernetes client implementation using fabric8 Kubernetes client library. */ | ||
| class K8sClient { | ||
| private val client: KubernetesClient = new DefaultKubernetesClient() | ||
| class K8sClient(props: Properties) { | ||
| val armadaMaster: String = props.getProperty("armada.master") | ||
| val pattern = """armada://([^:]+):.*""".r | ||
|
|
| # addresses, and use the first one that is not the loopback interface, or a | ||
| # Docker/K8S virtual interface. | ||
| if [ -z "${SPARK_LOCAL_IP:-}" ]; then | ||
| SPARK_LOCAL_IP=$(ifconfig -a | grep -w 'inet' | grep -Ev '(127\.0\.0\.1|\<172)' | awk '{print $2}' | sed -n '1p') |
|
|
||
| E2E_DIR=$(realpath "$0" | xargs dirname) | ||
|
|
||
| cd "$E2E_DIR" || (echo "Error: could not cd to $E2E_DIR"; exit 1) |
| // On macOS, host.docker.internal resolves inside Docker containers but NOT inside | ||
| // Kind pods (which use CoreDNS and don't inherit Docker's host alias). SPARK_LOCAL_IP | ||
| // is the host machine's LAN IP, reachable from both Docker containers (via --network host) | ||
| // and Kind pods (confirmed: executor pods already connect to this IP in client mode). | ||
| // On Linux, 172.18.0.1 is the Docker bridge gateway, accessible from both. |
| ``` | ||
| ARMADA_MASTER=armada://192.168.12.135:30002 | ||
| ``` | ||
| to the IP address or hostname of your Armada server. You should not need to change the port number. | ||
|
|
||
| Also, set the location of the three TLS certificate files by adding/setting: | ||
| ``` | ||
| export CLIENT_CERT_FILE=e2e/client.crt | ||
| export CLIENT_KEY_FILE=e2e/client.key | ||
| export CLUSTER_CA_FILE=e2e/ca.crt | ||
| ``` | ||
|
|
||
| - You should be able to now verify the armada-spark configuration by running the E2E tests: | ||
| ``` | ||
| $ ./scripts/dev-e2e.sh | ||
| ``` | ||
| This will save its output to `e2e-test.log` for further debugging. |
These changes add support for running the armada-spark end-to-end test suite using a remote Armada server (as opposed to a
kind-instantiated K8s cluster on the same system as the E2E test client). This has been developed and tested on a macOS system as the Spark client with an Ubuntu Linux server running Armada (on top of akindk8s cluster), aswell as the reversed (armada-spark client on Linux, Armada cluster on macOS). Some changes to fix Linux-macOS command incompatbilities were also made.
See updates to the
README.mdfor the steps to use this.Change Details:
README.mddetailing on how to get the cert files, defineARMADA_MASTERinscripts/config.sh, etc..spark-3.5.5).