Summary
Two security-focused E2E test scripts exist in test/e2e/ and are wired to the manual e2e-brev.yaml workflow (dispatch-only), but have never been wired into the nightly. They were created March 30 and last maintained April 23.
| Script |
Lines |
What It Tests |
test-credential-sanitization.sh |
809 |
24 tests: credential stripping from migration snapshots, auth-profiles.json deletion, blueprint digest verification, symlink traversal protection, runtime sandbox credential checks |
test-telegram-injection.sh |
475 |
18 tests: command injection prevention — $(cmd), backticks, quote breakout, ${VAR} expansion, process table leak checks, SANDBOX_NAME validation |
Key constraint: requires a pre-existing running sandbox
Unlike the self-contained scripts in #2566, these two scripts do not install or onboard NemoClaw themselves. Their headers say:
NemoClaw installed and sandbox running (test-full-e2e.sh Phase 0-3)
In e2e-brev.yaml, the full test suite runs first (creating the sandbox), and then credential-sanitization and telegram-injection run against it via the all suite option. The all suite explicitly does NOT run full because full destroys the sandbox at cleanup.
Options to wire into nightly
Option A: Dependent jobs after cloud-e2e
Run these as jobs that needs: [cloud-e2e] and reuse the sandbox cloud-e2e created. Challenge: cloud-e2e runs test-full-e2e.sh which tears down the sandbox at the end. Would need to either:
- Add a
RUN_E2E_SKIP_FINAL_CLEANUP=1 variant of cloud-e2e that leaves the sandbox alive
- Or create a shared setup job that installs + onboards, then fan out to
cloud-e2e, credential-sanitization-e2e, and telegram-injection-e2e in parallel
Option B: Add self-contained setup to each script
Add a Phase 0 to each script that runs install.sh + nemoclaw onboard (mirroring test-full-e2e.sh Phases 0–3). This makes them independent but adds ~10 min of install time per job.
Option C: Composite job
Create a single security-e2e nightly job that:
- Installs NemoClaw + onboards a sandbox
- Runs
test-credential-sanitization.sh
- Runs
test-telegram-injection.sh
- Cleans up
This is closest to how e2e-brev.yaml all works today.
Recommendation
Option C is the most pragmatic — one install, both scripts, one cleanup. The Brev workflow already validates this pattern works. The nightly job would look like:
security-e2e:
if: github.repository == 'NVIDIA/NemoClaw'
runs-on: ubuntu-latest
timeout-minutes: 45
steps:
- name: Checkout
uses: actions/checkout@v6
- name: Install and onboard
env:
NVIDIA_API_KEY: ${{ secrets.NVIDIA_API_KEY }}
NEMOCLAW_NON_INTERACTIVE: "1"
NEMOCLAW_ACCEPT_THIRD_PARTY_SOFTWARE: "1"
NEMOCLAW_SANDBOX_NAME: "e2e-security"
NEMOCLAW_RECREATE_SANDBOX: "1"
run: bash test/e2e/test-full-e2e.sh
# OR: a minimal install+onboard script that skips the
# inference/CLI phases and just leaves a running sandbox
- name: Run credential sanitization tests
env:
NVIDIA_API_KEY: ${{ secrets.NVIDIA_API_KEY }}
NEMOCLAW_SANDBOX_NAME: "e2e-security"
run: bash test/e2e/test-credential-sanitization.sh
- name: Run telegram injection tests
env:
NVIDIA_API_KEY: ${{ secrets.NVIDIA_API_KEY }}
NEMOCLAW_SANDBOX_NAME: "e2e-security"
run: bash test/e2e/test-telegram-injection.sh
- name: Upload logs on failure
if: failure()
uses: actions/upload-artifact@v4
with:
name: security-e2e-logs
path: /tmp/nemoclaw-e2e-*.log
if-no-files-found: ignore
Note: The setup step needs careful handling — test-full-e2e.sh destroys the sandbox at cleanup. Either use RUN_E2E_SKIP_FINAL_CLEANUP=1 (if the script supports it), or write a minimal install+onboard helper that skips Phase 4+ and cleanup.
Context
These are security regression tests — credential leakage and command injection. They run only when someone manually dispatches e2e-brev.yaml, which means security regressions can ship to users without being caught. In the last 3 weeks, the nightly E2E catch rate for externally-reported bugs was ~17%. Wiring security tests into the nightly is part of closing that gap.
The e2e-brev.yaml workflow validates these scripts work end-to-end on real Brev instances. The nightly would run them on ubuntu-latest with the same NVIDIA_API_KEY secret available to all other nightly jobs.
Acceptance criteria
Summary
Two security-focused E2E test scripts exist in
test/e2e/and are wired to the manuale2e-brev.yamlworkflow (dispatch-only), but have never been wired into the nightly. They were created March 30 and last maintained April 23.test-credential-sanitization.shauth-profiles.jsondeletion, blueprint digest verification, symlink traversal protection, runtime sandbox credential checkstest-telegram-injection.sh$(cmd), backticks, quote breakout,${VAR}expansion, process table leak checks,SANDBOX_NAMEvalidationKey constraint: requires a pre-existing running sandbox
Unlike the self-contained scripts in #2566, these two scripts do not install or onboard NemoClaw themselves. Their headers say:
In
e2e-brev.yaml, thefulltest suite runs first (creating the sandbox), and thencredential-sanitizationandtelegram-injectionrun against it via theallsuite option. Theallsuite explicitly does NOT runfullbecausefulldestroys the sandbox at cleanup.Options to wire into nightly
Option A: Dependent jobs after
cloud-e2eRun these as jobs that
needs: [cloud-e2e]and reuse the sandboxcloud-e2ecreated. Challenge:cloud-e2erunstest-full-e2e.shwhich tears down the sandbox at the end. Would need to either:RUN_E2E_SKIP_FINAL_CLEANUP=1variant ofcloud-e2ethat leaves the sandbox alivecloud-e2e,credential-sanitization-e2e, andtelegram-injection-e2ein parallelOption B: Add self-contained setup to each script
Add a Phase 0 to each script that runs
install.sh+nemoclaw onboard(mirroringtest-full-e2e.shPhases 0–3). This makes them independent but adds ~10 min of install time per job.Option C: Composite job
Create a single
security-e2enightly job that:test-credential-sanitization.shtest-telegram-injection.shThis is closest to how
e2e-brev.yamlallworks today.Recommendation
Option C is the most pragmatic — one install, both scripts, one cleanup. The Brev workflow already validates this pattern works. The nightly job would look like:
Note: The setup step needs careful handling —
test-full-e2e.shdestroys the sandbox at cleanup. Either useRUN_E2E_SKIP_FINAL_CLEANUP=1(if the script supports it), or write a minimal install+onboard helper that skips Phase 4+ and cleanup.Context
These are security regression tests — credential leakage and command injection. They run only when someone manually dispatches
e2e-brev.yaml, which means security regressions can ship to users without being caught. In the last 3 weeks, the nightly E2E catch rate for externally-reported bugs was ~17%. Wiring security tests into the nightly is part of closing that gap.The
e2e-brev.yamlworkflow validates these scripts work end-to-end on real Brev instances. The nightly would run them onubuntu-latestwith the sameNVIDIA_API_KEYsecret available to all other nightly jobs.Acceptance criteria
security-e2ejob or individually)notify-on-failureneeds:liste2e-security)