-
Notifications
You must be signed in to change notification settings - Fork 41
Add CI integration for ui-test skill #70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,99 @@ | ||
| # ui-test CI Integration | ||
|
|
||
| Run adversarial UI testing automatically on every PR that touches frontend files. | ||
|
|
||
| ## Architecture | ||
|
|
||
| ``` | ||
| PR opened → preview deploys (Vercel/Netlify) → GitHub Action triggers | ||
| → Claude Code (headless, --print mode) reads diff, plans tests | ||
| → browse CLI → Browserbase cloud browser tests the preview URL | ||
| → results posted as PR comment + HTML report uploaded as artifact | ||
| ``` | ||
|
|
||
| ## Setup | ||
|
|
||
| ### 1. Copy the workflow | ||
|
|
||
| ```bash | ||
| cp skills/ui-test/ci/ui-test.yml .github/workflows/ui-test.yml | ||
| ``` | ||
|
|
||
| ### 2. Add secrets | ||
|
|
||
| In your repo settings → Secrets and variables → Actions: | ||
|
|
||
| | Secret | Required | Description | | ||
| |--------|----------|-------------| | ||
| | `ANTHROPIC_API_KEY` | Yes | Claude API key | | ||
| | `BROWSERBASE_API_KEY` | Yes | Browserbase API key for cloud browsers | | ||
|
|
||
| ### 3. Configure preview deploy detection | ||
|
|
||
| The workflow defaults to **Vercel** preview detection. Edit the `wait-for-preview` job in `ui-test.yml` if you use Netlify, Cloudflare Pages, or a custom preview system. See the commented alternatives in the file. | ||
|
|
||
| ### 4. (Optional) Configure variables | ||
|
|
||
| In repo settings → Secrets and variables → Actions → Variables: | ||
|
|
||
| | Variable | Default | Description | | ||
| |----------|---------|-------------| | ||
| | `UI_TEST_MODE` | `light` | `light` = 2 agents, 20 steps each. `full` = 4 agents, 40 steps each | | ||
| | `UI_TEST_MAX_TOKENS` | `100000` | Max token budget per run | | ||
|
|
||
| ## How it works | ||
|
|
||
| 1. **Gate** — `paths-filter` checks if the PR touches UI files (`.tsx`, `.css`, etc.). Skips entirely if no UI changes. | ||
| 2. **Wait** — Waits for the preview deployment to be ready (up to 5 minutes). | ||
| 3. **Test** — `run-ui-test.sh` invokes Claude Code in `--print` mode with: | ||
| - The git diff of changed UI files | ||
| - The preview URL | ||
| - Mode-specific instructions (light vs full) | ||
| 4. **Report** — Posts a summary comment on the PR and uploads the HTML report as a GitHub Actions artifact. | ||
| 5. **Gate** — Exits non-zero if any test failed, so you can make it a required check. | ||
|
|
||
| ## Local testing | ||
|
|
||
| Test the CI flow locally before deploying to GitHub Actions: | ||
|
|
||
| ```bash | ||
| skills/ui-test/ci/run-ui-test.sh \ | ||
| --url http://localhost:3000 \ | ||
| --local \ | ||
| --mode light | ||
| ``` | ||
|
|
||
| The `--local` flag skips the diff gate (no PR needed) and uses `browse env local` instead of remote. Results go to `.context/ui-test-summary.md`. | ||
|
|
||
| ## Cost estimate | ||
|
|
||
| | Mode | Agents | Steps/agent | Estimated cost | | ||
| |------|--------|-------------|----------------| | ||
| | `light` | 2 | 20 | ~$0.50–$2 per run | | ||
| | `full` | 4 | 40 | ~$2–$5 per run | | ||
|
|
||
| These are rough estimates. Actual cost depends on diff size and number of pages tested. | ||
|
|
||
| ## Customization | ||
|
|
||
| ### Only run on labeled PRs | ||
|
|
||
| Add a condition to the workflow: | ||
|
|
||
| ```yaml | ||
| on: | ||
| pull_request: | ||
| types: [labeled] | ||
|
|
||
| jobs: | ||
| check-ui-changes: | ||
| if: contains(github.event.pull_request.labels.*.name, 'ui-test') | ||
| ``` | ||
|
|
||
| ### Adjust file filters | ||
|
|
||
| Edit the `paths-filter` step in `ui-test.yml` to match your project structure. | ||
|
|
||
| ### Fail threshold | ||
|
|
||
| By default, any STEP_FAIL causes a non-zero exit. To allow a pass rate threshold instead, modify the exit code logic in `run-ui-test.sh`. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,154 @@ | ||
| #!/usr/bin/env bash | ||
| set -euo pipefail | ||
|
|
||
| # ── Parse arguments ────────────────────────────────────────────────────── | ||
| PREVIEW_URL="" | ||
| MODE="light" | ||
| PR_NUMBER="" | ||
| REPO="" | ||
| LOCAL=false | ||
|
|
||
| while [[ $# -gt 0 ]]; do | ||
| case $1 in | ||
| --url) PREVIEW_URL="$2"; shift 2 ;; | ||
| --mode) MODE="$2"; shift 2 ;; | ||
| --pr) PR_NUMBER="$2"; shift 2 ;; | ||
| --repo) REPO="$2"; shift 2 ;; | ||
| --local) LOCAL=true; shift ;; | ||
| *) echo "Unknown arg: $1"; exit 1 ;; | ||
| esac | ||
| done | ||
|
|
||
| if [[ -z "$PREVIEW_URL" ]]; then | ||
| echo "Error: --url is required" | ||
| exit 1 | ||
| fi | ||
|
|
||
| # ── Verify preview is reachable ────────────────────────────────────────── | ||
| echo "Checking preview URL: $PREVIEW_URL" | ||
| HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" "$PREVIEW_URL" 2>/dev/null || echo "000") | ||
| if [[ "$HTTP_STATUS" == "000" ]]; then | ||
| echo "Error: Preview URL is not reachable" | ||
| exit 1 | ||
| fi | ||
| echo "Preview is up (HTTP $HTTP_STATUS)" | ||
|
|
||
| # ── Build the prompt ───────────────────────────────────────────────────── | ||
| if [[ "$LOCAL" == true ]]; then | ||
| # Local mode: skip diff gate, test the full app | ||
| UI_FILES="(local mode — no diff filter, testing full app)" | ||
| BROWSE_ENV="browse env local" | ||
| DIFF_CONTEXT="No diff available (local mode). Explore the app and test what you find." | ||
| else | ||
| DIFF_FILES=$(git diff --name-only origin/main...HEAD 2>/dev/null || git diff --name-only HEAD~1) | ||
|
|
||
| # Filter to UI-relevant files only | ||
| UI_FILES=$(echo "$DIFF_FILES" | grep -E '\.(tsx|jsx|vue|svelte|css|scss)$' || true) | ||
|
|
||
| if [[ -z "$UI_FILES" ]]; then | ||
| echo "No UI files changed. Skipping tests." | ||
| mkdir -p .context | ||
| echo "No UI files changed in this PR." > .context/ui-test-summary.md | ||
| echo "0" > .context/ui-test-exit-code | ||
| exit 0 | ||
| fi | ||
| BROWSE_ENV="browse env remote" | ||
| DIFF_CONTEXT="Full diff of changed files (for context on what specifically changed): | ||
| $(git diff origin/main...HEAD -- $UI_FILES 2>/dev/null | head -500 || echo "Could not generate diff")" | ||
| fi | ||
|
|
||
| echo "UI files changed:" | ||
| echo "$UI_FILES" | ||
|
|
||
| # Build mode-specific instructions | ||
| if [[ "$MODE" == "light" ]]; then | ||
| MODE_INSTRUCTIONS="Run in CI-light mode: | ||
| - Use at most 2 sub-agents | ||
| - Budget each sub-agent at 20 browse steps max | ||
| - Focus on: functional correctness of changed components, basic accessibility (axe-core), and console errors | ||
| - Skip: exploratory testing, visual/design consistency, UX heuristics | ||
| - Skip: HTML report generation (the summary is enough for CI)" | ||
| else | ||
| MODE_INSTRUCTIONS="Run in full mode: | ||
| - Use up to 4 sub-agents | ||
| - Budget each sub-agent at 40 browse steps max | ||
| - Cover: functional, adversarial, accessibility, responsive, console health | ||
| - Generate the HTML report" | ||
| fi | ||
|
|
||
| PR_CONTEXT="" | ||
| if [[ -n "$PR_NUMBER" && -n "$REPO" ]]; then | ||
| PR_CONTEXT="This is PR #${PR_NUMBER} on ${REPO}." | ||
| fi | ||
|
|
||
| PROMPT=$(cat <<PROMPT_EOF | ||
| You are running as a CI check. Test the UI at the target URL. | ||
|
|
||
| ${PR_CONTEXT} | ||
|
|
||
| Target URL: ${PREVIEW_URL} | ||
|
|
||
| Changed UI files: | ||
| ${UI_FILES} | ||
|
|
||
| ${DIFF_CONTEXT} | ||
|
|
||
| ## Instructions | ||
|
|
||
| 1. Use \`${BROWSE_ENV}\` to set up the browser environment. | ||
| 2. Analyze the context above to understand what to test. | ||
| 3. Run the planning rounds (functional, adversarial, coverage gaps) then execute. | ||
| 4. ${MODE_INSTRUCTIONS} | ||
| 5. After all tests complete, write a markdown summary to .context/ui-test-summary.md with: | ||
| - Total tests, passed, failed, skipped counts | ||
| - A table of all test results (step-id, status, one-line evidence) | ||
| - For failures: reproduction steps and suggested fix | ||
| 6. Write the exit code to .context/ui-test-exit-code: | ||
| - "0" if all tests passed | ||
| - "1" if any test failed | ||
|
|
||
| Important CI constraints: | ||
| - Do NOT open any files in an editor or attempt interactive operations | ||
| - Do NOT ask for user input — make all decisions autonomously | ||
| - Keep total execution under 10 minutes | ||
| - Always run \`browse stop\` when done (and stop all named sessions) | ||
| PROMPT_EOF | ||
| ) | ||
|
|
||
| # ── Setup ──────────────────────────────────────────────────────────────── | ||
| mkdir -p .context/ui-test-screenshots | ||
|
|
||
| # ── Run Claude Code ────────────────────────────────────────────────────── | ||
| echo "Starting UI test run..." | ||
| echo "$PROMPT" | claude --print \ | ||
| --dangerously-skip-permissions \ | ||
| --allowed-tools "Bash(browse:*)" "Bash(BROWSE_SESSION=*)" "Bash(mkdir:*)" "Bash(curl:*)" "Bash(git:*)" "Read" "Glob" "Grep" "Agent" "Write" \ | ||
| 2>&1 | tee .context/ui-test-output.log | ||
|
|
||
| # ── Post-run ───────────────────────────────────────────────────────────── | ||
| # Ensure browse sessions are cleaned up | ||
| browse stop 2>/dev/null || true | ||
| pkill -f "browse.*daemon" 2>/dev/null || true | ||
|
|
||
| # Default exit code if Claude didn't write one | ||
| if [[ ! -f .context/ui-test-exit-code ]]; then | ||
| echo "1" > .context/ui-test-exit-code | ||
| echo "Warning: Claude did not write an exit code. Defaulting to failure." | ||
| fi | ||
|
|
||
| # Default summary if Claude didn't write one | ||
| if [[ ! -f .context/ui-test-summary.md ]]; then | ||
| cat > .context/ui-test-summary.md <<'EOF' | ||
| UI test run completed but did not produce a structured summary. | ||
|
|
||
| Check the full output log for details. | ||
| EOF | ||
| fi | ||
|
|
||
| echo "" | ||
| echo "=======================================" | ||
| echo "UI Test Complete" | ||
| echo "=======================================" | ||
| cat .context/ui-test-summary.md | ||
| echo "" | ||
| echo "Exit code: $(cat .context/ui-test-exit-code)" | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Script never exits with the test result codeHigh Severity The script writes the test exit code to Reviewed by Cursor Bugbot for commit 734b241. Configure here. |
||


There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pipefailskips cleanup when Claude command failsMedium Severity
With
set -euo pipefail, if theclaude --printcommand exits non-zero (e.g., bad API key, network failure, timeout), the entire script aborts at the pipeline on line 126. This skips the entire post-run section: browse session cleanup (browse stop,pkill), default exit-code file creation, and default summary file creation. Without the exit-code file, the workflow's "Check pass rate" step silently exits 0, masking the failure.Additional Locations (2)
skills/ui-test/ci/run-ui-test.sh#L122-L126skills/ui-test/ci/run-ui-test.sh#L127-L146Reviewed by Cursor Bugbot for commit 734b241. Configure here.