Skip to content

fix(test): surface error lines for generic runners#2427

Open
axelray-dev wants to merge 3 commits into
rtk-ai:developfrom
axelray-dev:fix/2420-test-output-truncation
Open

fix(test): surface error lines for generic runners#2427
axelray-dev wants to merge 3 commits into
rtk-ai:developfrom
axelray-dev:fix/2420-test-output-truncation

Conversation

@axelray-dev

@axelray-dev axelray-dev commented Jun 13, 2026

Copy link
Copy Markdown

What

Fixes #2420

For unrecognized test runners (anything besides cargo/pytest/jest/go), extract_test_summary() previously showed only the last 5 lines of output. If failures appeared earlier in the output, they were hidden.

Now it scans all lines against the existing ERROR_PATTERNS regexes (errors, failures, panics, tracebacks, etc.) and surfaces matching lines under a [FAIL] ERRORS: header. When no error lines are found, the original "last 5 lines" fallback is preserved.

Follow-up: error-pattern scanning is now gated on non-zero exit code. ERROR_PATTERNS are substring-based and can match benign summary lines (e.g. "Summary: 0 failed, 10 passed") in passing runners. Passing runners (exit 0) always get the tail fallback regardless of keyword matches.

Changes

  • src/cmds/rust/runner.rs: In extract_test_summary(), the generic runner fallback now applies ERROR_PATTERNS matching before falling back to tail output, but only when the exit code is non-zero. Switched run_test from run_filtered to run_filtered_with_exit to receive the exit code. Added 3 tests covering error surfacing, no-error fallback, and the false-positive case.

Tests

cargo fmt --all && cargo clippy --all-targets && cargo test --all

All 2153 tests pass. The three new tests verify:

  1. Generic runner output with error lines at the top surfaces those errors instead of the last 5 lines
  2. Generic runner output with no errors preserves the existing last-5-lines behavior
  3. Passing runner (exit 0) with "failed" keyword in output shows last 5 lines, not [FAIL] ERRORS:

Notes

  • Only the generic runner fallback path is changed. Cargo, pytest, jest, and go runners are unaffected.
  • Uses the existing ERROR_PATTERNS lazy_static, no new patterns added.
  • DCO signed.

Signed-off-by: axelray-dev <110029405+axelray-dev@users.noreply.github.com>
@axelray-dev axelray-dev marked this pull request as ready for review June 13, 2026 14:24

@hgunduzoglu hgunduzoglu left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked this out and tested locally (macOS arm64). The core behavior is right and the truncation hint (... +N more) is a nice touch:

generic runner output before this PR
error at top, then 6 passing lines last 5 lines (error hidden) [FAIL] ERRORS: surfaces the error ✓
no error lines last 5 lines last 5 lines ✓ (fallback preserved)

cargo fmt/clippy/test clean, 16 runner tests pass. Scoping it to is_generic (leaving cargo/pytest/jest/go untouched) is the right call.

One real edge case worth handling: false positives on passing runs. ERROR_PATTERNS is substring-based and broad — r"(?i)^.*failed.*$", .*failure.*, .*warning.*, .*exception.* match those words anywhere on a line, including benign summary lines. So a runner that passes (exit 0) but prints a summary surfaces a misleading FAIL header:

$ rtk test ./runner.sh     # script prints "Summary: 0 failed, 10 passed", exit 0
[FAIL] ERRORS:
  Summary: 0 failed, 10 passed

An agent reading [FAIL] ERRORS: … 0 failed, 10 passed would conclude the suite failed when it passed — the opposite of the accurate-signal goal. Before this PR that same run showed a neutral OUTPUT (last 5 lines):, so it's a behavior change specifically in the passing case.

Since #2420 is really about surfacing failures when the runner fails, gating the error-surfacing on a non-zero exit code would fix both the original issue and this false positive. extract_test_summary doesn't currently receive the exit code, but runner.rs already has a RunMode::FilteredWithExit variant, so it can be threaded through without much churn — then: non-zero exit → [FAIL] ERRORS: (with the pattern scan), exit 0 → the existing tail fallback.

Core fix is solid; this is the one thing I'd tighten before merge. Could also add a test for the exit-0-with-"failed"-in-summary case to lock it down.

ERROR_PATTERNS are substring-based and can match benign summary lines
(e.g. 'Summary: 0 failed, 10 passed') in passing runners. Only scan for
error patterns when the exit code is non-zero to avoid false-positive
[FAIL] ERRORS: headers on successful runs.
@axelray-dev

Copy link
Copy Markdown
Author

Good catch, thanks. Pushed a fix that gates the error-pattern scan on non-zero exit code. Now uses run_filtered_with_exit so the exit code is threaded through to extract_test_summary -- passing runners (exit 0) always get the tail fallback regardless of keyword matches in the output.

Added a test for the exact false-positive case you described (exit 0 with "0 failed, 10 passed" in output). All 2153 tests pass.

@luccinmasirika

luccinmasirika commented Jun 18, 2026

Copy link
Copy Markdown

Thanks @axelray-dev for picking this one up, and good call gating the scan on the exit code.

I pulled the branch and ran the exact repro from the issue (script prints FAIL: test_login broke, then 50 PASS: lines, exits 1). It still comes back as:

OUTPUT (last 5 lines):
  PASS: case_46
  ...
  PASS: case_50

So the FAIL: line is still hidden. The scan itself works, but ERROR_PATTERNS only has failed and failure, no bare fail/FAIL, so a line like FAIL: test_login broke never matches and falls through to the tail. Easy to confirm: changing the repro to print FAILED: surfaces it under [FAIL] ERRORS:, but FAIL: doesn't.

Adding one pattern covers it:

Regex::new(r"(?i)^.*\bfail\b.*$").unwrap(),

With that, the repro surfaces FAIL: test_login broke on exit 1, and since the scan is still gated on a non-zero exit, the passing 0 failed, 10 passed case stays on the tail fallback. Runner tests stay green. Might be worth a test using the literal repro line so this stays locked down.

Comment thread src/cmds/rust/runner.rs
Regex::new(r"(?i)^.*\berr\b.*$").unwrap(),
Regex::new(r"(?i)^.*warning[\s:\[].*$").unwrap(),
Regex::new(r"(?i)^.*\bwarn\b.*$").unwrap(),
Regex::new(r"(?i)^.*failed.*$").unwrap(),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concretely, this is the one-line addition that covers it. Putting it just above
the failed pattern keeps that one intact:

Suggested change
Regex::new(r"(?i)^.*\bfail\b.*$").unwrap(),
Regex::new(r"(?i)^.*failed.*$").unwrap(),

Add Regex pattern for 'fail' as a whole word before the existing
'failed' pattern to catch more error cases while keeping the
existing pattern intact.
@axelray-dev

Copy link
Copy Markdown
Author

Pushed the word-boundary pattern as suggested. Added a regression test using the exact repro line ("FAIL: something went wrong" with exit 1) to lock it down. All runner tests pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

rtk test shows only the last 5 output lines (not failures) for unrecognized runners

3 participants