Skip to content

ci(nightly-e2e): wire credential-sanitization and telegram-injection E2E into nightly #2567

@jyaunches

Description

@jyaunches

Summary

Two security-focused E2E test scripts exist in test/e2e/ and are wired to the manual e2e-brev.yaml workflow (dispatch-only), but have never been wired into the nightly. They were created March 30 and last maintained April 23.

Script Lines What It Tests
test-credential-sanitization.sh 809 24 tests: credential stripping from migration snapshots, auth-profiles.json deletion, blueprint digest verification, symlink traversal protection, runtime sandbox credential checks
test-telegram-injection.sh 475 18 tests: command injection prevention — $(cmd), backticks, quote breakout, ${VAR} expansion, process table leak checks, SANDBOX_NAME validation

Key constraint: requires a pre-existing running sandbox

Unlike the self-contained scripts in #2566, these two scripts do not install or onboard NemoClaw themselves. Their headers say:

NemoClaw installed and sandbox running (test-full-e2e.sh Phase 0-3)

In e2e-brev.yaml, the full test suite runs first (creating the sandbox), and then credential-sanitization and telegram-injection run against it via the all suite option. The all suite explicitly does NOT run full because full destroys the sandbox at cleanup.

Options to wire into nightly

Option A: Dependent jobs after cloud-e2e

Run these as jobs that needs: [cloud-e2e] and reuse the sandbox cloud-e2e created. Challenge: cloud-e2e runs test-full-e2e.sh which tears down the sandbox at the end. Would need to either:

  • Add a RUN_E2E_SKIP_FINAL_CLEANUP=1 variant of cloud-e2e that leaves the sandbox alive
  • Or create a shared setup job that installs + onboards, then fan out to cloud-e2e, credential-sanitization-e2e, and telegram-injection-e2e in parallel

Option B: Add self-contained setup to each script

Add a Phase 0 to each script that runs install.sh + nemoclaw onboard (mirroring test-full-e2e.sh Phases 0–3). This makes them independent but adds ~10 min of install time per job.

Option C: Composite job

Create a single security-e2e nightly job that:

  1. Installs NemoClaw + onboards a sandbox
  2. Runs test-credential-sanitization.sh
  3. Runs test-telegram-injection.sh
  4. Cleans up

This is closest to how e2e-brev.yaml all works today.

Recommendation

Option C is the most pragmatic — one install, both scripts, one cleanup. The Brev workflow already validates this pattern works. The nightly job would look like:

  security-e2e:
    if: github.repository == 'NVIDIA/NemoClaw'
    runs-on: ubuntu-latest
    timeout-minutes: 45
    steps:
      - name: Checkout
        uses: actions/checkout@v6

      - name: Install and onboard
        env:
          NVIDIA_API_KEY: ${{ secrets.NVIDIA_API_KEY }}
          NEMOCLAW_NON_INTERACTIVE: "1"
          NEMOCLAW_ACCEPT_THIRD_PARTY_SOFTWARE: "1"
          NEMOCLAW_SANDBOX_NAME: "e2e-security"
          NEMOCLAW_RECREATE_SANDBOX: "1"
        run: bash test/e2e/test-full-e2e.sh
        # OR: a minimal install+onboard script that skips the
        # inference/CLI phases and just leaves a running sandbox

      - name: Run credential sanitization tests
        env:
          NVIDIA_API_KEY: ${{ secrets.NVIDIA_API_KEY }}
          NEMOCLAW_SANDBOX_NAME: "e2e-security"
        run: bash test/e2e/test-credential-sanitization.sh

      - name: Run telegram injection tests
        env:
          NVIDIA_API_KEY: ${{ secrets.NVIDIA_API_KEY }}
          NEMOCLAW_SANDBOX_NAME: "e2e-security"
        run: bash test/e2e/test-telegram-injection.sh

      - name: Upload logs on failure
        if: failure()
        uses: actions/upload-artifact@v4
        with:
          name: security-e2e-logs
          path: /tmp/nemoclaw-e2e-*.log
          if-no-files-found: ignore

Note: The setup step needs careful handling — test-full-e2e.sh destroys the sandbox at cleanup. Either use RUN_E2E_SKIP_FINAL_CLEANUP=1 (if the script supports it), or write a minimal install+onboard helper that skips Phase 4+ and cleanup.

Context

These are security regression tests — credential leakage and command injection. They run only when someone manually dispatches e2e-brev.yaml, which means security regressions can ship to users without being caught. In the last 3 weeks, the nightly E2E catch rate for externally-reported bugs was ~17%. Wiring security tests into the nightly is part of closing that gap.

The e2e-brev.yaml workflow validates these scripts work end-to-end on real Brev instances. The nightly would run them on ubuntu-latest with the same NVIDIA_API_KEY secret available to all other nightly jobs.

Acceptance criteria

  • Both scripts run as part of the nightly (either as a composite security-e2e job or individually)
  • Job is in the notify-on-failure needs: list
  • Sandbox setup does not conflict with other nightly jobs (use a distinct sandbox name like e2e-security)
  • First nightly run shows the job(s) executing

Metadata

Metadata

Assignees

No one assigned

    Labels

    04-25-regressionIssues raised from the Apr 25 weekend regression analysisCI/CDUse this label to identify issues with NemoClaw CI/CD pipeline or GitHub Actions.E2EEnd-to-end testing — Brev infrastructure, test cases, nightly failures, and coverage gaps

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions