Add support for running E2E tests with remote Armada server by richscott · Pull Request #83 · armadaproject/armada-spark

richscott · 2025-11-21T23:27:48Z

These changes add support for running the armada-spark end-to-end test suite using a remote Armada server (as opposed to a kind-instantiated K8s cluster on the same system as the E2E test client). This has been developed and tested on a macOS system as the Spark client with an Ubuntu Linux server running Armada (on top of a kind k8s cluster), as
well as the reversed (armada-spark client on Linux, Armada cluster on macOS). Some changes to fix Linux-macOS command incompatbilities were also made.

See updates to the README.md for the steps to use this.

Change Details:

support for using TLS certificates is added - instructions have been added to README.md detailing on how to get the cert files, define ARMADA_MASTER in scripts/config.sh, etc.
the dev-e2e.sh has been modified to check for the existence and health of the specified Armada server, but if ARMADA_MASTER is not localhost, it will not attempt to start the server via armada-operator.
the dev-e2e.sh script now does not use Docker to run the tests - it directly uses the local clone-copy of Spark (which is typically in .spark-3.5.5).
the code to acquire a fabric8 k8s client has been changed from the default client (which assumes its running inside the destination cluster), to build the client, and use the TLS cert files if specified in the init.sh/config.sh configuration settings.

These changes are to allow running an armada-spark client against a remote (i.e. non-localhost) Armada cluster server. Signed-off-by: Rich Scott <richscott@sent.com>

Also, fixes in TestOrchestrator for running against a remote Armada instance, and run the tests directly, instead of using a Docker container on the client. Signed-off-by: Rich Scott <richscott@sent.com>

It will soon be used by ArmadaClientApplication. Signed-off-by: Rich Scott <richscott@sent.com>

Signed-off-by: Rich Scott <richscott@sent.com>

…artup tasks Signed-off-by: Rich Scott <richscott@sent.com>

…rver. Signed-off-by: Rich Scott <richscott@sent.com>

GeorgeJahad · 2025-11-22T00:40:08Z

+of `kubectl config view`. These files can be left in this directory.
+


Should e2e/*crt be in .gitignore?

I expect that most times that the certs will be run is either when someone does testing with an external Armada cluster that is not just a Kind-based cluster that the dev/user set up on their own, so they have to get the certs from the Armada cluster administrator, and may want to track those in their armada-spark repo (at least while testing) for safekeeping/diff-tracking purposes. Probably 95+% of the time that e2e will be run will be with a local Kind-based cluster started up via the armada-operator, so these won't be generated anyway.

GeorgeJahad · 2025-11-22T01:00:53Z

-    Seq(armadactlCmd) ++ subCommand.split(" ") ++ Seq("--armadaUrl", armadaUrl)
+    // armadactl command expects the server address to be of the form
+    // <hostname-or-IP>:<port> with no pseudo-protocol prefix
+    var armadactlUrl = armadaUrl.replaceFirst("^armada://", "")


This code has changed substantially, and now is:

212 private def buildCommand(subCommand: String): Seq[String] = { 213 val armadactlCmd = resolveArmadactlPath.getOrElse { 214 throw new RuntimeException("armadactl not found in system properties or PATH") 215 } 216 // armadactl command expects the server address to be of the form 217 // <hostname-or-IP>:<port> with no pseudo-protocol prefix 218 val pattern = """.*armada://(.+)""".r 219 var armadactlUrl = "undefined-armadactl-url" 220 221 armadaUrl match { 222 case pattern(hostPort) => armadactlUrl = hostPort // e.g. "localhost:30002" 223 case _ => 224 throw new RuntimeException( 225 s"could not extract valid armadactl URL from armada URL ${armadaUrl}" 226 ) 227 } 228 229 Seq(armadactlCmd) ++ subCommand.split(" ") ++ Seq("--armadaUrl", armadactlUrl) 230 }

I declare armadactlUrl with var and set its initial value fo "undefined-armada-url", which is overwritten in a successful pattern match, so the default value will make apparent to the user that something did not get set correctly.

i don't think the user will ever see the default value, "undefined-armadactl-url" after the pattern match. It will either match or throw.

so i think getting rid of the var and writing the val like so is more idiomatic:

val armadactlUrl = armadaUrl match { case pattern(hostPort) => hostPort

👍🏼 Code now changed to use this strategy in commit c5c02d5.

GeorgeJahad · 2025-11-22T01:18:00Z

Running with:

ARMADA_MASTER=armada://localhost:30002

no longer seems to work:

scripts/submitArmadaSpark.sh 100
Submitting spark job to Armada.
/home/gbj/incoming/armada-spark/scripts/init.sh: line 67: CLIENT_CERT_FILE: unbound variable

Signed-off-by: Rich Scott <richscott@sent.com>

Get the first external interface IP address and use in Kind configuration for allowing external K8S/Armada API access. Add protective quotes around TLS vars in dev-e2e.sh, per shellcheck. Use `realpath` for getting reliable full pathnames. Signed-off-by: Rich Scott <richscott@sent.com>

Use better pattern checks for verifying if Armada master is localhost; remove quote wrapper around tls_args, so Maven doesn't error on a "" target. Signed-off-by: Rich Scott <richscott@sent.com>

Signed-off-by: Rich Scott <richscott@sent.com>

Copilot

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 9 comments.

Comments suppressed due to low confidence (1)

scripts/dev-e2e.sh:198

When armadactl get queues fails with "connection refused", the script unconditionally calls start-armada, even if ARMADA_MASTER points at a remote server. This contradicts the intended remote-server mode and can cause unexpected local cluster creation. Guard this branch so start-armada is only attempted when using a local/localhost Armada target; otherwise fail fast with a clear message.

  echo "Checking to see if Armada cluster is available ..."

  if ! "$scripts"/armadactl get queues > "$STATUSFILE" 2>&1 ; then
    if grep -q 'connection refused' "$STATUSFILE"; then
      start-armada 2>&1 | log_group "Using armada-operator to start Armada; this may take up to 5 minutes"
      sleep 10
      armadactl-retry get queues

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: Rich Scott <richscott@sent.com>

sudiptob2

Thanks for addressing the previous reviews. Looking good. I gave it a re-review, and here are a few small comments I have.

Signed-off-by: Rich Scott <richscott@sent.com>

Copilot

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 11 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-28T22:58:12Z

+    -Darmada.master="$ARMADA_MASTER" \
+    -Darmada.lookout.url="$ARMADA_LOOKOUT_URL" \
+    -Darmadactl.path="$scripts/armadactl" \
+    ${tls_args[@]:-} 2>&1 | tee e2e-test.log


tls_args expansion is unquoted (${tls_args[@]:-}), so file paths containing spaces will be split into multiple arguments. Use quoted array expansion when passing these through to Maven (e.g., "${tls_args[@]}") to preserve argument boundaries.

Suggested change

${tls_args[@]:-} 2>&1 | tee e2e-test.log

"${tls_args[@]}" 2>&1 | tee e2e-test.log

Copilot · 2026-04-28T22:58:13Z

+CONTEXT=$(kubectl config current-context)
+
+E2E_DIR=$(realpath "$0" | xargs dirname)
+
+cd "$E2E_DIR" || (echo "Error: could not cd to $E2E_DIR"; exit 1)
+
+# What These Files Are
+#  -  client.crt: Your user (client) certificate
+#  -  client.key: The private key associated with the certificate
+#  -  ca.crt: The CA certificate used by the Kubernetes API server (for verifying client and server certs)
+
+# Extract the client certificate
+kubectl config view --raw -o json | jq -r \
+  ".users[] | select(.name == \"${CONTEXT}\") | .user.[\"client-certificate-data\"]" | base64 -d > client.crt
+
+# Extract the client key
+kubectl config view --raw -o json | jq -r \
+  ".users[] | select(.name == \"${CONTEXT}\") | .user.[\"client-key-data\"]" | base64 -d > client.key
+
+# Extract the cluster CA certificate
+kubectl config view --raw -o json | jq -r \
+  ".clusters[] | select(.name == \"${CONTEXT}\") | .cluster.[\"certificate-authority-data\"]" | base64 -d > ca.crt


This script assumes the current context name equals both the kubeconfig user name and cluster name, but in many kubeconfigs the context name differs from .users[].name and .clusters[].name. As written it may extract empty values or the wrong credentials. Consider resolving the context entry first, then using its referenced .context.user and .context.cluster names to fetch the right user/cluster data. Also add set -euo pipefail (and explicit checks for jq/base64) so failures don’t silently produce empty/invalid cert files.

Suggested change

CONTEXT=$(kubectl config current-context)

E2E_DIR=$(realpath "$0" | xargs dirname)

cd "$E2E_DIR" || (echo "Error: could not cd to $E2E_DIR"; exit 1)

# What These Files Are

# - client.crt: Your user (client) certificate

# - client.key: The private key associated with the certificate

# - ca.crt: The CA certificate used by the Kubernetes API server (for verifying client and server certs)

# Extract the client certificate

kubectl config view --raw -o json | jq -r \

".users[] | select(.name == \"${CONTEXT}\") | .user.[\"client-certificate-data\"]" | base64 -d > client.crt

# Extract the client key

kubectl config view --raw -o json | jq -r \

".users[] | select(.name == \"${CONTEXT}\") | .user.[\"client-key-data\"]" | base64 -d > client.key

# Extract the cluster CA certificate

kubectl config view --raw -o json | jq -r \

".clusters[] | select(.name == \"${CONTEXT}\") | .cluster.[\"certificate-authority-data\"]" | base64 -d > ca.crt

set -euo pipefail

command -v kubectl >/dev/null 2>&1 || { echo "Error: kubectl is required but not installed or not in PATH"; exit 1; }

command -v jq >/dev/null 2>&1 || { echo "Error: jq is required but not installed or not in PATH"; exit 1; }

command -v base64 >/dev/null 2>&1 || { echo "Error: base64 is required but not installed or not in PATH"; exit 1; }

CONTEXT=$(kubectl config current-context)

E2E_DIR=$(realpath "$0" | xargs dirname)

cd "$E2E_DIR" || (echo "Error: could not cd to $E2E_DIR"; exit 1)

KUBECONFIG_JSON=$(kubectl config view --raw -o json)

USER_NAME=$(printf '%s\n' "$KUBECONFIG_JSON" | jq -r --arg context "$CONTEXT" '.contexts[] | select(.name == $context) | .context.user')

CLUSTER_NAME=$(printf '%s\n' "$KUBECONFIG_JSON" | jq -r --arg context "$CONTEXT" '.contexts[] | select(.name == $context) | .context.cluster')

if [ -z "$USER_NAME" ] || [ "$USER_NAME" = "null" ]; then

echo "Error: could not resolve user for context $CONTEXT"

exit 1

fi

if [ -z "$CLUSTER_NAME" ] || [ "$CLUSTER_NAME" = "null" ]; then

echo "Error: could not resolve cluster for context $CONTEXT"

exit 1

fi

# What These Files Are

# - client.crt: Your user (client) certificate

# - client.key: The private key associated with the certificate

# - ca.crt: The CA certificate used by the Kubernetes API server (for verifying client and server certs)

# Extract the client certificate

printf '%s\n' "$KUBECONFIG_JSON" | jq -r --arg user "$USER_NAME" \

'.users[] | select(.name == $user) | .user["client-certificate-data"]' | base64 -d > client.crt

# Extract the client key

printf '%s\n' "$KUBECONFIG_JSON" | jq -r --arg user "$USER_NAME" \

'.users[] | select(.name == $user) | .user["client-key-data"]' | base64 -d > client.key

# Extract the cluster CA certificate

printf '%s\n' "$KUBECONFIG_JSON" | jq -r --arg cluster "$CLUSTER_NAME" \

'.clusters[] | select(.name == $cluster) | .cluster["certificate-authority-data"]' | base64 -d > ca.crt

Copilot · 2026-04-28T22:58:13Z

    val props = loadProperties()

+    val armadaApiUrl = props.getProperty("armada.master", "armada://localhost:30002")
+    armadaClient = new ArmadaClient(armadaApiUrl)
+    k8sClient = new K8sClient(props)
+    orch = new TestOrchestrator(armadaClient, k8sClient)


loadProperties() does not propagate the TLS-related system properties/env vars (e.g., client_cert_file, client_key_file, cluster_ca_file). As a result, new K8sClient(props) will always see empty values and never enable TLS even when scripts/dev-e2e.sh passes -Dclient_cert_file=... etc. Add these properties to loadProperties() (and map from env vars like CLIENT_CERT_FILE), or have K8sClient read from sys.props/sys.env directly when not present in props.

Copilot · 2026-04-28T22:58:13Z

+    val cf   = CertificateFactory.getInstance("X.509")
+    val cert = cf.generateCertificate(new FileInputStream(clientCertFile))
+    algo = cert.getPublicKey.getAlgorithm // "RSA" or "EC"
+  }


The FileInputStream used to read the client certificate isn’t closed, which can leak file descriptors on repeated test runs. Use a managed resource (e.g., scala.util.Using) or ensure the stream is closed in a finally block.

Copilot · 2026-04-28T22:58:13Z

+  val clientCertFile: String = props.getProperty("client_cert_file", "")
+  val clientKeyFile: String  = props.getProperty("client_key_file", "")
+  val clusterCaFile: String  = props.getProperty("cluster_ca_file", "")
+  var algo: String           = ""
+
+  var cb: ConfigBuilder = new ConfigBuilder()
+    .withMasterUrl(k8sApiURL)
+    .withNamespace("default")
+
+  if (clusterCaFile.nonEmpty) {
+    cb = cb.withCaCertFile(clusterCaFile)
+  }
+
+  if (clientCertFile.nonEmpty) {
+    cb = cb.withClientCertFile(clientCertFile)
+
+    val cf   = CertificateFactory.getInstance("X.509")
+    val cert = cf.generateCertificate(new FileInputStream(clientCertFile))
+    algo = cert.getPublicKey.getAlgorithm // "RSA" or "EC"
+  }
+
+  if (clientKeyFile.nonEmpty) {
+    cb = cb.withClientKeyFile(clientKeyFile)
+  }
+
+  val cfg =
+    if (clientCertFile.nonEmpty) cb.withClientKeyAlgo(algo).build() else cb.build()
+


TLS configuration allows partial input (e.g., client_key_file set but client_cert_file empty). This can create a KubernetesClient configuration that fails in non-obvious ways. Consider validating that cert+key are provided together (and that referenced files exist/readable) and fail fast with a clear error message.

Copilot · 2026-04-28T22:58:14Z

+    echo "ERROR: could not determine external network interface address. In order to run Spark"
+    echo "applications in client deployment mode, you may need to set SPARK_LOCAL_IP in your"
+    echo "scripts/config.sh to the external IP address of this system, for example:"
+    echo "  export SPARK_LOCAL_IP=\"192.168.1.555\""


The example IP 192.168.1.555 is not a valid IPv4 address (octets must be 0-255). Update it to a valid example value to avoid confusion.

Suggested change

echo " export SPARK_LOCAL_IP=\"192.168.1.555\""

echo " export SPARK_LOCAL_IP=\"192.168.1.55\""

Copilot · 2026-04-28T22:58:14Z

+  IMG_REGEX='^[[:alnum:]_./-]+:[[:alnum:]_-]+$'
+
+  if ! (echo "$IMAGE_NAME" | grep -Eq "$IMG_REGEX"); then
    err "IMAGE_NAME is not defined. Please set it in $scripts/config.sh, for example:"
    err "IMAGE_NAME=spark:testing"
    exit 1
  fi

+  if ! (echo "$INIT_CONTAINER_IMAGE" | grep -Eq "$IMG_REGEX"); then
+    err "INIT_CONTAINER_IMAGE is not defined. Please set it in $scripts/config.sh, for example:"
+    err "INIT_CONTAINER_IMAGE=busybox:latest"
+    exit 1
+  fi


IMG_REGEX rejects many valid Docker image references (e.g., tags with dots like busybox:1.36.1, registries with ports like localhost:5000/img:tag, digests, etc.). This will cause false negatives for legitimate configurations. Consider relaxing the regex (at least allow . in the tag and : in the registry part) or use Docker’s own reference parsing instead of a custom regex.

Copilot · 2026-04-28T22:58:14Z

+  if [[ ! -d ".spark-$SPARK_VERSION" ]]; then
+    echo "Checking out Spark sources for tag v$SPARK_VERSION."
+    git clone https://github.com/apache/spark --branch v$SPARK_VERSION --depth 1 --no-tags ".spark-$SPARK_VERSION"
+  fi
+
+  cd ".spark-$SPARK_VERSION"
+  # Spark 3.3.4 does not compile without this fix
+  if [[ "$SPARK_VERSION" == "3.3.4" ]]; then
+    "${sed_inplace[@]}" -e "s%<scala.version>2.13.8</scala.version>%<scala.version>2.13.6</scala.version>%" pom.xml
+    # Fix deprecated openjdk base image - use eclipse-temurin:11-jammy instead.
+    spark_dockerfile="resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile"
+    if [ -f "$spark_dockerfile" ]; then
+      "${sed_inplace[@]}" -e 's|FROM openjdk:|FROM eclipse-temurin:|g' "$spark_dockerfile"
+      "${sed_inplace[@]}" -E 's/^ARG java_image_tag=11-jre-slim$/ARG java_image_tag=11-jammy/' "$spark_dockerfile"
+    fi
+  fi
+  ./dev/change-scala-version.sh $SCALA_BIN_VERSION
+  # by packaging the assembly project specifically, jars of all depending Spark projects are fetch from Maven
+  # spark-examples jars are not released, so we need to build these from sources
+  ./build/mvn --batch-mode clean
+  ./build/mvn --batch-mode package -pl examples
+  ./build/mvn --batch-mode package -Pkubernetes -Pscala-$SCALA_BIN_VERSION -pl assembly
+  cd ..


run-test() always clones and builds Spark from source, which is very slow and redundant given this script already requires $IMAGE_NAME to exist locally. Consider only building Spark when the Docker image build actually needs it (as scripts/createImage.sh already does), or gate this step behind a flag/condition to avoid unnecessary network + CPU cost on each E2E run.

Suggested change

if [[ ! -d ".spark-$SPARK_VERSION" ]]; then

echo "Checking out Spark sources for tag v$SPARK_VERSION."

git clone https://github.com/apache/spark --branch v$SPARK_VERSION --depth 1 --no-tags ".spark-$SPARK_VERSION"

fi

cd ".spark-$SPARK_VERSION"

# Spark 3.3.4 does not compile without this fix

if [[ "$SPARK_VERSION" == "3.3.4" ]]; then

"${sed_inplace[@]}" -e "s%<scala.version>2.13.8</scala.version>%<scala.version>2.13.6</scala.version>%" pom.xml

# Fix deprecated openjdk base image - use eclipse-temurin:11-jammy instead.

spark_dockerfile="resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile"

if [ -f "$spark_dockerfile" ]; then

"${sed_inplace[@]}" -e 's|FROM openjdk:|FROM eclipse-temurin:|g' "$spark_dockerfile"

"${sed_inplace[@]}" -E 's/^ARG java_image_tag=11-jre-slim$/ARG java_image_tag=11-jammy/' "$spark_dockerfile"

fi

fi

./dev/change-scala-version.sh $SCALA_BIN_VERSION

# by packaging the assembly project specifically, jars of all depending Spark projects are fetch from Maven

# spark-examples jars are not released, so we need to build these from sources

./build/mvn --batch-mode clean

./build/mvn --batch-mode package -pl examples

./build/mvn --batch-mode package -Pkubernetes -Pscala-$SCALA_BIN_VERSION -pl assembly

cd ..

if [[ "${BUILD_SPARK_FROM_SOURCE:-false}" == "true" ]]; then

if [[ ! -d ".spark-$SPARK_VERSION" ]]; then

echo "Checking out Spark sources for tag v$SPARK_VERSION."

git clone https://github.com/apache/spark --branch v$SPARK_VERSION --depth 1 --no-tags ".spark-$SPARK_VERSION"

fi

cd ".spark-$SPARK_VERSION"

# Spark 3.3.4 does not compile without this fix

if [[ "$SPARK_VERSION" == "3.3.4" ]]; then

"${sed_inplace[@]}" -e "s%<scala.version>2.13.8</scala.version>%<scala.version>2.13.6</scala.version>%" pom.xml

# Fix deprecated openjdk base image - use eclipse-temurin:11-jammy instead.

spark_dockerfile="resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile"

if [ -f "$spark_dockerfile" ]; then

"${sed_inplace[@]}" -e 's|FROM openjdk:|FROM eclipse-temurin:|g' "$spark_dockerfile"

"${sed_inplace[@]}" -E 's/^ARG java_image_tag=11-jre-slim$/ARG java_image_tag=11-jammy/' "$spark_dockerfile"

fi

fi

./dev/change-scala-version.sh $SCALA_BIN_VERSION

# by packaging the assembly project specifically, jars of all depending Spark projects are fetch from Maven

# spark-examples jars are not released, so we need to build these from sources

./build/mvn --batch-mode clean

./build/mvn --batch-mode package -pl examples

./build/mvn --batch-mode package -Pkubernetes -Pscala-$SCALA_BIN_VERSION -pl assembly

cd ..

else

log "Skipping Spark source checkout/build; set BUILD_SPARK_FROM_SOURCE=true to enable it."

fi

Copilot · 2026-04-28T22:58:14Z

+  # into the Kind config so that it binds to a valid address instead
+  # of the hardcoded placeholder (192.168.12.135).
+  external_ip=$(ifconfig -a| grep -w 'inet'  | grep -v 'inet 127\.0\.0' | grep -v 'inet 172\.' | awk '{print $2}' | sed -ne '1p')
+
+  if [ -z "$external_ip" ]; then
+    err "Unable to find any IP addresses on an external interface on this system; exiting now"
+    exit 1
+  fi


external_ip discovery excludes all 172.x.x.x addresses. Many real networks use the private 172.16.0.0/12 range, so this can incorrectly fail to find an external IP and abort. Prefer selecting by interface (exclude docker/kind interfaces by name) or use a route-based method (e.g., derive the source IP for a default route) rather than excluding whole RFC1918 subranges.

Copilot · 2026-04-28T22:58:15Z

  fi
+
+  echo "Extracting TLS client certificate files from Kind cluster"
+  if ! e2e/extract-kind-cert.sh; then


This calls e2e/extract-kind-cert.sh using a relative path, which will fail if scripts/dev-e2e.sh is invoked from a directory other than the repo root. Use an absolute path based on $scripts (e.g., $scripts/../e2e/extract-kind-cert.sh) for consistency with other references in the script.

Suggested change

if ! e2e/extract-kind-cert.sh; then

if ! "$scripts/../e2e/extract-kind-cert.sh"; then

sudiptob2 · 2026-04-29T21:36:43Z

+  if [[ ! -d ".spark-$SPARK_VERSION" ]]; then
+    echo "Checking out Spark sources for tag v$SPARK_VERSION."
+    git clone https://github.com/apache/spark --branch v$SPARK_VERSION --depth 1 --no-tags ".spark-$SPARK_VERSION"
+  fi
+
+  cd ".spark-$SPARK_VERSION"
+  # Spark 3.3.4 does not compile without this fix
+  if [[ "$SPARK_VERSION" == "3.3.4" ]]; then
+    "${sed_inplace[@]}" -e "s%<scala.version>2.13.8</scala.version>%<scala.version>2.13.6</scala.version>%" pom.xml
+    # Fix deprecated openjdk base image - use eclipse-temurin:11-jammy instead.
+    spark_dockerfile="resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile"
+    if [ -f "$spark_dockerfile" ]; then
+      "${sed_inplace[@]}" -e 's|FROM openjdk:|FROM eclipse-temurin:|g' "$spark_dockerfile"
+      "${sed_inplace[@]}" -E 's/^ARG java_image_tag=11-jre-slim$/ARG java_image_tag=11-jammy/' "$spark_dockerfile"
+    fi
+  fi
+  ./dev/change-scala-version.sh $SCALA_BIN_VERSION
+  # by packaging the assembly project specifically, jars of all depending Spark projects are fetch from Maven
+  # spark-examples jars are not released, so we need to build these from sources
+  ./build/mvn --batch-mode clean
+  ./build/mvn --batch-mode package -pl examples
+  ./build/mvn --batch-mode package -Pkubernetes -Pscala-$SCALA_BIN_VERSION -pl assembly
+  cd ..
+


The test runs spark-submit via Docker (see TestOrchestrator.scala). The required JARs are already built and baked into the image by createImage.sh.

armada-spark/scripts/createImage.sh

Lines 46 to 52 in c3b2733

./dev/change-scala-version.sh $SCALA_BIN_VERSION

# by packaging the assembly project specifically, jars of all depending Spark projects are fetch from Maven

# spark-examples jars are not released, so we need to build these from sources

./build/mvn --batch-mode clean

./build/mvn --batch-mode package -pl examples

./build/mvn --batch-mode package -Pkubernetes -Pscala-$SCALA_BIN_VERSION -pl assembly

./bin/docker-image-tool.sh -t "$image_tag" $extra_build_params build

Given that, rebuilding or recreating them during run-test appears redundant. It also adds overhead when running e2e tests repeatedly. Let me know if I am missing a scenario where this step is still required?

Thanks - this redundant chunk removed in commit 6f584cb.

GeorgeJahad · 2026-04-29T22:28:53Z

+    // armadactl command expects the server address to be of the form
+    // <hostname-or-IP>:<port> with no pseudo-protocol prefix
+    val armadactlUrl =
+      if (armadaUrl.startsWith("armada://")) {


stripPrefix is a no-op when the prefix isn't present, so the if is unnecessary:

val armadactlUrl = armadaUrl.stripPrefix("armada://")

Good catch - fixed in commit 6f584cb.

Also remove superfluous conditional for using String.stripPrefix(). Signed-off-by: Rich Scott <richscott@sent.com>

Signed-off-by: Rich Scott <richscott@sent.com>

Copilot

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 6 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+  val clientCertFile: String = props.getProperty("client_cert_file", "")
+  val clientKeyFile: String  = props.getProperty("client_key_file", "")
+  val clusterCaFile: String  = props.getProperty("cluster_ca_file", "")


+import org.yaml.snakeyaml.Yaml
+
+import java.io.{FileReader, FileInputStream}
+import java.nio.file.{Paths, Files}
+import java.security.cert.CertificateFactory
 import java.util.concurrent.TimeoutException
+import java.util.Properties
 import scala.jdk.CollectionConverters._
 import scala.concurrent.duration._
 import scala.concurrent.{ExecutionContext, Future}

 /** Kubernetes client implementation using fabric8 Kubernetes client library. */
-class K8sClient {
-  private val client: KubernetesClient = new DefaultKubernetesClient()
+class K8sClient(props: Properties) {
+  val armadaMaster: String = props.getProperty("armada.master")
+  val pattern              = """armada://([^:]+):.*""".r
+


+# addresses, and use the first one that is not the loopback interface, or a
+# Docker/K8S virtual interface.
+if [ -z "${SPARK_LOCAL_IP:-}" ]; then
+  SPARK_LOCAL_IP=$(ifconfig -a | grep -w 'inet' | grep -Ev '(127\.0\.0\.1|\<172)' | awk '{print $2}' | sed -n '1p')


+
+E2E_DIR=$(realpath "$0" | xargs dirname)
+
+cd "$E2E_DIR" || (echo "Error: could not cd to $E2E_DIR"; exit 1)


+  // On macOS, host.docker.internal resolves inside Docker containers but NOT inside
+  // Kind pods (which use CoreDNS and don't inherit Docker's host alias). SPARK_LOCAL_IP
+  // is the host machine's LAN IP, reachable from both Docker containers (via --network host)
+  // and Kind pods (confirmed: executor pods already connect to this IP in client mode).
+  // On Linux, 172.18.0.1 is the Docker bridge gateway, accessible from both.


+```
+ARMADA_MASTER=armada://192.168.12.135:30002
+```
+to the IP address or hostname of your Armada server. You should not need to change the port number.
+
+Also, set the location of the three TLS certificate files by adding/setting:
+```
+export CLIENT_CERT_FILE=e2e/client.crt
+export CLIENT_KEY_FILE=e2e/client.key
+export CLUSTER_CA_FILE=e2e/ca.crt
+```
+
+-  You should be able to now verify the armada-spark configuration by running the E2E tests:
+```
+$ ./scripts/dev-e2e.sh
+```
+This will save its output to `e2e-test.log` for further debugging.


richscott added 9 commits November 17, 2025 16:56

Start adding support for running against remote Armada cluster server.

b635db5

These changes are to allow running an armada-spark client against a remote (i.e. non-localhost) Armada cluster server. Signed-off-by: Rich Scott <richscott@sent.com>

Merge branch 'master' into rich/support-remote-server

49aa0a7

Add script for getting TLS cert files for remote server

b14a8e8

Also, fixes in TestOrchestrator for running against a remote Armada instance, and run the tests directly, instead of using a Docker container on the client. Signed-off-by: Rich Scott <richscott@sent.com>

Merge branch 'master' into rich/support-remote-server

16fc24e

Move K8sClient class from src/test into src/main

7d3a456

It will soon be used by ArmadaClientApplication. Signed-off-by: Rich Scott <richscott@sent.com>

Merge branch 'master' into rich/support-remote-server

5bc5858

Remove debugging messages; re-enable init-cluster on e2e script.

aa1e144

Signed-off-by: Rich Scott <richscott@sent.com>

In E2E script, if Armada server is not localhost, don't do certain st…

c538b1b

…artup tasks Signed-off-by: Rich Scott <richscott@sent.com>

Add more README.md content on setting up for using a remote Armada se…

0c06f07

…rver. Signed-off-by: Rich Scott <richscott@sent.com>

richscott requested review from GeorgeJahad and dejanzele November 21, 2025 23:27

richscott self-assigned this Nov 21, 2025

GeorgeJahad reviewed Nov 22, 2025

View reviewed changes

Comment thread scripts/dev-e2e.sh Outdated

GeorgeJahad reviewed Nov 22, 2025

View reviewed changes

Comment thread src/test/scala/org/apache/spark/deploy/armada/e2e/TestAssertions.scala Outdated

GeorgeJahad reviewed Nov 22, 2025

View reviewed changes

Comment thread src/test/scala/org/apache/spark/deploy/armada/e2e/TestOrchestrator.scala

GeorgeJahad reviewed Nov 22, 2025

View reviewed changes

Comment thread src/test/scala/org/apache/spark/deploy/armada/e2e/TestOrchestrator.scala Outdated

GeorgeJahad reviewed Nov 22, 2025

View reviewed changes

Comment thread src/test/scala/org/apache/spark/deploy/armada/e2e/TestOrchestrator.scala Outdated

richscott added 10 commits November 24, 2025 08:58

Quote evaluated script vars, for CI checker

fee4552

Signed-off-by: Rich Scott <richscott@sent.com>

Disable building/testing for Spark 3.5.5 config for now.

e4e4a83

Signed-off-by: Rich Scott <richscott@sent.com>

Scala linter fixes; Bash lint fixes

7e49435

Signed-off-by: Rich Scott <richscott@sent.com>

Add ARMADA_LOOKOUT_URL to init.sh

2b705ae

Signed-off-by: Rich Scott <richscott@sent.com>

Check if TLS cert vars are defined before referencing

cc7639a

Signed-off-by: Rich Scott <richscott@sent.com>

Conditionally add TLS cert properties to test invocation

b3d4265

Signed-off-by: Rich Scott <richscott@sent.com>

Check for busybox image for init container

8127bcb

Use better pattern checks for verifying if Armada master is localhost; remove quote wrapper around tls_args, so Maven doesn't error on a "" target. Signed-off-by: Rich Scott <richscott@sent.com>

Add init container image name.

5d51466

Signed-off-by: Rich Scott <richscott@sent.com>

Add diagnostic search log for .spark-* directory

1172550

Signed-off-by: Rich Scott <richscott@sent.com>

richscott added 2 commits April 24, 2026 15:36

Clarify comment about finding external network interface.

2bf3388

Signed-off-by: Rich Scott <richscott@sent.com>

Merge branch 'master' into rich/support-remote-server

efda07a

richscott requested review from GeorgeJahad, Copilot and sudiptob2 April 24, 2026 21:43

Copilot started reviewing on behalf of richscott April 24, 2026 21:43 View session

Copilot AI reviewed Apr 24, 2026

View reviewed changes

richscott added 2 commits April 24, 2026 18:11

Port-forwarding for port 7078 is not necessary with "host" network.

25af300

Signed-off-by: Rich Scott <richscott@sent.com>

Scala formatter fix

ed816a0

Signed-off-by: Rich Scott <richscott@sent.com>

sudiptob2 reviewed Apr 27, 2026

View reviewed changes

richscott added 7 commits April 27, 2026 09:54

Typo fix in README.md.

1558f8e

Signed-off-by: Rich Scott <richscott@sent.com>

Normalize org.apache.spark.deploy.armada.e2e package for K8sClient

98f3c51

Signed-off-by: Rich Scott <richscott@sent.com>

Remove unused import

448a09a

Signed-off-by: Rich Scott <richscott@sent.com>

Use fabric8 Config package to parser kube config.

7ff8595

Signed-off-by: Rich Scott <richscott@sent.com>

Replace hard-wired k8s context with dynamic value.

1fc55cd

Signed-off-by: Rich Scott <richscott@sent.com>

Improve parsing of Armada Master URL value

31a5355

Signed-off-by: Rich Scott <richscott@sent.com>

Refine sed usage, per Copilot suggestion.

73c730d

Signed-off-by: Rich Scott <richscott@sent.com>

richscott requested review from Copilot and sudiptob2 April 28, 2026 22:51

Copilot started reviewing on behalf of richscott April 28, 2026 22:52 View session

Copilot AI reviewed Apr 28, 2026

View reviewed changes

sudiptob2 reviewed Apr 29, 2026

View reviewed changes

GeorgeJahad reviewed Apr 29, 2026

View reviewed changes

richscott added 2 commits May 4, 2026 16:21

Remove redundant compilation of Spark source in E2E.

6f584cb

Also remove superfluous conditional for using String.stripPrefix(). Signed-off-by: Rich Scott <richscott@sent.com>

Merge branch 'master' into rich/support-remote-server

157a0b4

Signed-off-by: Rich Scott <richscott@sent.com>

richscott requested review from GeorgeJahad, Copilot and sudiptob2 May 5, 2026 01:03

Copilot started reviewing on behalf of richscott May 5, 2026 01:04 View session

Copilot AI reviewed May 5, 2026

View reviewed changes

		of `kubectl config view`. These files can be left in this directory.

	${tls_args[@]:-} 2>&1 \| tee e2e-test.log
	"${tls_args[@]}" 2>&1 \| tee e2e-test.log

	echo " export SPARK_LOCAL_IP=\"192.168.1.555\""
	echo " export SPARK_LOCAL_IP=\"192.168.1.55\""

	if ! e2e/extract-kind-cert.sh; then
	if ! "$scripts/../e2e/extract-kind-cert.sh"; then

	./dev/change-scala-version.sh $SCALA_BIN_VERSION
	# by packaging the assembly project specifically, jars of all depending Spark projects are fetch from Maven
	# spark-examples jars are not released, so we need to build these from sources
	./build/mvn --batch-mode clean
	./build/mvn --batch-mode package -pl examples
	./build/mvn --batch-mode package -Pkubernetes -Pscala-$SCALA_BIN_VERSION -pl assembly
	./bin/docker-image-tool.sh -t "$image_tag" $extra_build_params build


		E2E_DIR=$(realpath "$0" \| xargs dirname)

		cd "$E2E_DIR" \|\| (echo "Error: could not cd to $E2E_DIR"; exit 1)

Conversation

richscott commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

GeorgeJahad commented Nov 22, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sudiptob2 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

richscott commented Nov 21, 2025 •

edited

Loading