ci(nightly-e2e): wire credential-sanitization and telegram-injection E2E into nightly

## Summary

Two security-focused E2E test scripts exist in `test/e2e/` and are wired to the **manual** `e2e-brev.yaml` workflow (dispatch-only), but have **never been wired into the nightly**. They were created March 30 and last maintained April 23.

| Script | Lines | What It Tests |
|--------|-------|--------------|
| `test-credential-sanitization.sh` | 809 | 24 tests: credential stripping from migration snapshots, `auth-profiles.json` deletion, blueprint digest verification, symlink traversal protection, runtime sandbox credential checks |
| `test-telegram-injection.sh` | 475 | 18 tests: command injection prevention — `$(cmd)`, backticks, quote breakout, `${VAR}` expansion, process table leak checks, `SANDBOX_NAME` validation |

## Key constraint: requires a pre-existing running sandbox

Unlike the self-contained scripts in #2566, these two scripts **do not install or onboard NemoClaw themselves**. Their headers say:

> NemoClaw installed and sandbox running (test-full-e2e.sh Phase 0-3)

In `e2e-brev.yaml`, the `full` test suite runs first (creating the sandbox), and then `credential-sanitization` and `telegram-injection` run against it via the `all` suite option. The `all` suite explicitly does NOT run `full` because `full` destroys the sandbox at cleanup.

### Options to wire into nightly

**Option A: Dependent jobs after `cloud-e2e`**

Run these as jobs that `needs: [cloud-e2e]` and reuse the sandbox `cloud-e2e` created. Challenge: `cloud-e2e` runs `test-full-e2e.sh` which tears down the sandbox at the end. Would need to either:
- Add a `RUN_E2E_SKIP_FINAL_CLEANUP=1` variant of `cloud-e2e` that leaves the sandbox alive
- Or create a shared setup job that installs + onboards, then fan out to `cloud-e2e`, `credential-sanitization-e2e`, and `telegram-injection-e2e` in parallel

**Option B: Add self-contained setup to each script**

Add a Phase 0 to each script that runs `install.sh` + `nemoclaw onboard` (mirroring `test-full-e2e.sh` Phases 0–3). This makes them independent but adds ~10 min of install time per job.

**Option C: Composite job**

Create a single `security-e2e` nightly job that:
1. Installs NemoClaw + onboards a sandbox
2. Runs `test-credential-sanitization.sh`
3. Runs `test-telegram-injection.sh`
4. Cleans up

This is closest to how `e2e-brev.yaml` `all` works today.

### Recommendation

**Option C** is the most pragmatic — one install, both scripts, one cleanup. The Brev workflow already validates this pattern works. The nightly job would look like:

```yaml
  security-e2e:
    if: github.repository == 'NVIDIA/NemoClaw'
    runs-on: ubuntu-latest
    timeout-minutes: 45
    steps:
      - name: Checkout
        uses: actions/checkout@v6

      - name: Install and onboard
        env:
          NVIDIA_API_KEY: ${{ secrets.NVIDIA_API_KEY }}
          NEMOCLAW_NON_INTERACTIVE: "1"
          NEMOCLAW_ACCEPT_THIRD_PARTY_SOFTWARE: "1"
          NEMOCLAW_SANDBOX_NAME: "e2e-security"
          NEMOCLAW_RECREATE_SANDBOX: "1"
        run: bash test/e2e/test-full-e2e.sh
        # OR: a minimal install+onboard script that skips the
        # inference/CLI phases and just leaves a running sandbox

      - name: Run credential sanitization tests
        env:
          NVIDIA_API_KEY: ${{ secrets.NVIDIA_API_KEY }}
          NEMOCLAW_SANDBOX_NAME: "e2e-security"
        run: bash test/e2e/test-credential-sanitization.sh

      - name: Run telegram injection tests
        env:
          NVIDIA_API_KEY: ${{ secrets.NVIDIA_API_KEY }}
          NEMOCLAW_SANDBOX_NAME: "e2e-security"
        run: bash test/e2e/test-telegram-injection.sh

      - name: Upload logs on failure
        if: failure()
        uses: actions/upload-artifact@v4
        with:
          name: security-e2e-logs
          path: /tmp/nemoclaw-e2e-*.log
          if-no-files-found: ignore
```

**Note:** The setup step needs careful handling — `test-full-e2e.sh` destroys the sandbox at cleanup. Either use `RUN_E2E_SKIP_FINAL_CLEANUP=1` (if the script supports it), or write a minimal install+onboard helper that skips Phase 4+ and cleanup.

## Context

These are **security regression tests** — credential leakage and command injection. They run only when someone manually dispatches `e2e-brev.yaml`, which means security regressions can ship to users without being caught. In the last 3 weeks, the nightly E2E catch rate for externally-reported bugs was ~17%. Wiring security tests into the nightly is part of closing that gap.

The `e2e-brev.yaml` workflow validates these scripts work end-to-end on real Brev instances. The nightly would run them on `ubuntu-latest` with the same `NVIDIA_API_KEY` secret available to all other nightly jobs.

## Acceptance criteria

- [ ] Both scripts run as part of the nightly (either as a composite `security-e2e` job or individually)
- [ ] Job is in the `notify-on-failure` `needs:` list
- [ ] Sandbox setup does not conflict with other nightly jobs (use a distinct sandbox name like `e2e-security`)
- [ ] First nightly run shows the job(s) executing


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci(nightly-e2e): wire credential-sanitization and telegram-injection E2E into nightly #2567

Summary

Key constraint: requires a pre-existing running sandbox

Options to wire into nightly

Recommendation

Context

Acceptance criteria

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Script	Lines	What It Tests
`test-credential-sanitization.sh`	809	24 tests: credential stripping from migration snapshots, `auth-profiles.json` deletion, blueprint digest verification, symlink traversal protection, runtime sandbox credential checks
`test-telegram-injection.sh`	475	18 tests: command injection prevention — `$(cmd)`, backticks, quote breakout, `${VAR}` expansion, process table leak checks, `SANDBOX_NAME` validation

ci(nightly-e2e): wire credential-sanitization and telegram-injection E2E into nightly #2567

Description

Summary

Key constraint: requires a pre-existing running sandbox

Options to wire into nightly

Recommendation

Context

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions