Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
99 changes: 99 additions & 0 deletions skills/ui-test/ci/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
# ui-test CI Integration

Run adversarial UI testing automatically on every PR that touches frontend files.

## Architecture

```
PR opened → preview deploys (Vercel/Netlify) → GitHub Action triggers
→ Claude Code (headless, --print mode) reads diff, plans tests
→ browse CLI → Browserbase cloud browser tests the preview URL
→ results posted as PR comment + HTML report uploaded as artifact
```

## Setup

### 1. Copy the workflow

```bash
cp skills/ui-test/ci/ui-test.yml .github/workflows/ui-test.yml
```

### 2. Add secrets

In your repo settings → Secrets and variables → Actions:

| Secret | Required | Description |
|--------|----------|-------------|
| `ANTHROPIC_API_KEY` | Yes | Claude API key |
| `BROWSERBASE_API_KEY` | Yes | Browserbase API key for cloud browsers |

### 3. Configure preview deploy detection

The workflow defaults to **Vercel** preview detection. Edit the `wait-for-preview` job in `ui-test.yml` if you use Netlify, Cloudflare Pages, or a custom preview system. See the commented alternatives in the file.

### 4. (Optional) Configure variables

In repo settings → Secrets and variables → Actions → Variables:

| Variable | Default | Description |
|----------|---------|-------------|
| `UI_TEST_MODE` | `light` | `light` = 2 agents, 20 steps each. `full` = 4 agents, 40 steps each |
| `UI_TEST_MAX_TOKENS` | `100000` | Max token budget per run |

## How it works

1. **Gate** — `paths-filter` checks if the PR touches UI files (`.tsx`, `.css`, etc.). Skips entirely if no UI changes.
2. **Wait** — Waits for the preview deployment to be ready (up to 5 minutes).
3. **Test** — `run-ui-test.sh` invokes Claude Code in `--print` mode with:
- The git diff of changed UI files
- The preview URL
- Mode-specific instructions (light vs full)
4. **Report** — Posts a summary comment on the PR and uploads the HTML report as a GitHub Actions artifact.
5. **Gate** — Exits non-zero if any test failed, so you can make it a required check.

## Local testing

Test the CI flow locally before deploying to GitHub Actions:

```bash
skills/ui-test/ci/run-ui-test.sh \
--url http://localhost:3000 \
--local \
--mode light
```

The `--local` flag skips the diff gate (no PR needed) and uses `browse env local` instead of remote. Results go to `.context/ui-test-summary.md`.

## Cost estimate

| Mode | Agents | Steps/agent | Estimated cost |
|------|--------|-------------|----------------|
| `light` | 2 | 20 | ~$0.50–$2 per run |
| `full` | 4 | 40 | ~$2–$5 per run |

These are rough estimates. Actual cost depends on diff size and number of pages tested.

## Customization

### Only run on labeled PRs

Add a condition to the workflow:

```yaml
on:
pull_request:
types: [labeled]

jobs:
check-ui-changes:
if: contains(github.event.pull_request.labels.*.name, 'ui-test')
```

### Adjust file filters

Edit the `paths-filter` step in `ui-test.yml` to match your project structure.

### Fail threshold

By default, any STEP_FAIL causes a non-zero exit. To allow a pass rate threshold instead, modify the exit code logic in `run-ui-test.sh`.
154 changes: 154 additions & 0 deletions skills/ui-test/ci/run-ui-test.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
#!/usr/bin/env bash
set -euo pipefail
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pipefail skips cleanup when Claude command fails

Medium Severity

With set -euo pipefail, if the claude --print command exits non-zero (e.g., bad API key, network failure, timeout), the entire script aborts at the pipeline on line 126. This skips the entire post-run section: browse session cleanup (browse stop, pkill), default exit-code file creation, and default summary file creation. Without the exit-code file, the workflow's "Check pass rate" step silently exits 0, masking the failure.

Additional Locations (2)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 734b241. Configure here.


# ── Parse arguments ──────────────────────────────────────────────────────
PREVIEW_URL=""
MODE="light"
PR_NUMBER=""
REPO=""
LOCAL=false

while [[ $# -gt 0 ]]; do
case $1 in
--url) PREVIEW_URL="$2"; shift 2 ;;
--mode) MODE="$2"; shift 2 ;;
--pr) PR_NUMBER="$2"; shift 2 ;;
--repo) REPO="$2"; shift 2 ;;
--local) LOCAL=true; shift ;;
*) echo "Unknown arg: $1"; exit 1 ;;
esac
done

if [[ -z "$PREVIEW_URL" ]]; then
echo "Error: --url is required"
exit 1
fi

# ── Verify preview is reachable ──────────────────────────────────────────
echo "Checking preview URL: $PREVIEW_URL"
HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" "$PREVIEW_URL" 2>/dev/null || echo "000")
if [[ "$HTTP_STATUS" == "000" ]]; then
echo "Error: Preview URL is not reachable"
exit 1
fi
echo "Preview is up (HTTP $HTTP_STATUS)"

# ── Build the prompt ─────────────────────────────────────────────────────
if [[ "$LOCAL" == true ]]; then
# Local mode: skip diff gate, test the full app
UI_FILES="(local mode — no diff filter, testing full app)"
BROWSE_ENV="browse env local"
DIFF_CONTEXT="No diff available (local mode). Explore the app and test what you find."
else
DIFF_FILES=$(git diff --name-only origin/main...HEAD 2>/dev/null || git diff --name-only HEAD~1)

# Filter to UI-relevant files only
UI_FILES=$(echo "$DIFF_FILES" | grep -E '\.(tsx|jsx|vue|svelte|css|scss)$' || true)

if [[ -z "$UI_FILES" ]]; then
echo "No UI files changed. Skipping tests."
mkdir -p .context
echo "No UI files changed in this PR." > .context/ui-test-summary.md
echo "0" > .context/ui-test-exit-code
exit 0
fi
BROWSE_ENV="browse env remote"
DIFF_CONTEXT="Full diff of changed files (for context on what specifically changed):
$(git diff origin/main...HEAD -- $UI_FILES 2>/dev/null | head -500 || echo "Could not generate diff")"
fi

echo "UI files changed:"
echo "$UI_FILES"

# Build mode-specific instructions
if [[ "$MODE" == "light" ]]; then
MODE_INSTRUCTIONS="Run in CI-light mode:
- Use at most 2 sub-agents
- Budget each sub-agent at 20 browse steps max
- Focus on: functional correctness of changed components, basic accessibility (axe-core), and console errors
- Skip: exploratory testing, visual/design consistency, UX heuristics
- Skip: HTML report generation (the summary is enough for CI)"
else
MODE_INSTRUCTIONS="Run in full mode:
- Use up to 4 sub-agents
- Budget each sub-agent at 40 browse steps max
- Cover: functional, adversarial, accessibility, responsive, console health
- Generate the HTML report"
fi

PR_CONTEXT=""
if [[ -n "$PR_NUMBER" && -n "$REPO" ]]; then
PR_CONTEXT="This is PR #${PR_NUMBER} on ${REPO}."
fi

PROMPT=$(cat <<PROMPT_EOF
You are running as a CI check. Test the UI at the target URL.

${PR_CONTEXT}

Target URL: ${PREVIEW_URL}

Changed UI files:
${UI_FILES}

${DIFF_CONTEXT}

## Instructions

1. Use \`${BROWSE_ENV}\` to set up the browser environment.
2. Analyze the context above to understand what to test.
3. Run the planning rounds (functional, adversarial, coverage gaps) then execute.
4. ${MODE_INSTRUCTIONS}
5. After all tests complete, write a markdown summary to .context/ui-test-summary.md with:
- Total tests, passed, failed, skipped counts
- A table of all test results (step-id, status, one-line evidence)
- For failures: reproduction steps and suggested fix
6. Write the exit code to .context/ui-test-exit-code:
- "0" if all tests passed
- "1" if any test failed

Important CI constraints:
- Do NOT open any files in an editor or attempt interactive operations
- Do NOT ask for user input — make all decisions autonomously
- Keep total execution under 10 minutes
- Always run \`browse stop\` when done (and stop all named sessions)
PROMPT_EOF
)

# ── Setup ────────────────────────────────────────────────────────────────
mkdir -p .context/ui-test-screenshots

# ── Run Claude Code ──────────────────────────────────────────────────────
echo "Starting UI test run..."
echo "$PROMPT" | claude --print \
--dangerously-skip-permissions \
--allowed-tools "Bash(browse:*)" "Bash(BROWSE_SESSION=*)" "Bash(mkdir:*)" "Bash(curl:*)" "Bash(git:*)" "Read" "Glob" "Grep" "Agent" "Write" \
2>&1 | tee .context/ui-test-output.log

# ── Post-run ─────────────────────────────────────────────────────────────
# Ensure browse sessions are cleaned up
browse stop 2>/dev/null || true
pkill -f "browse.*daemon" 2>/dev/null || true

# Default exit code if Claude didn't write one
if [[ ! -f .context/ui-test-exit-code ]]; then
echo "1" > .context/ui-test-exit-code
echo "Warning: Claude did not write an exit code. Defaulting to failure."
fi

# Default summary if Claude didn't write one
if [[ ! -f .context/ui-test-summary.md ]]; then
cat > .context/ui-test-summary.md <<'EOF'
UI test run completed but did not produce a structured summary.

Check the full output log for details.
EOF
fi

echo ""
echo "======================================="
echo "UI Test Complete"
echo "======================================="
cat .context/ui-test-summary.md
echo ""
echo "Exit code: $(cat .context/ui-test-exit-code)"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Script never exits with the test result code

High Severity

The script writes the test exit code to .context/ui-test-exit-code and prints it, but never actually calls exit with that value. The last command is an echo, so the script always exits 0 when Claude runs to completion — even if tests failed. For local usage (documented in the README), this silently hides failures. In CI, the separate "Check pass rate" step partially compensates, but the "Run UI tests" step itself will misleadingly show as green.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 734b241. Configure here.

Loading