Skip to content

feat(coverage): Add code coverage collection and export#216

Merged
sohil-kshirsagar merged 38 commits intomainfrom
feat/code-coverage-tracking-poc
Apr 8, 2026
Merged

feat(coverage): Add code coverage collection and export#216
sohil-kshirsagar merged 38 commits intomainfrom
feat/code-coverage-tracking-poc

Conversation

@sohil-kshirsagar
Copy link
Copy Markdown
Contributor

@sohil-kshirsagar sohil-kshirsagar commented Apr 3, 2026

Add code coverage collection to the Tusk Drift CLI. Collects per-test and aggregate code coverage during test replay, with LCOV/JSON export, config-driven activation for CI, and backend upload.

Coverage activation

Two modes:

  • Config-driven (CI): coverage.enabled: true in .tusk/config.yaml → automatically collects during validation runs on the default branch. Silent (no console output). No CI pipeline changes needed.
  • Flag-driven (local): --show-coverage displays coverage, --coverage-output writes LCOV/JSON.

Architecture

  1. CLI sets TUSK_COVERAGE=true + NODE_V8_COVERAGE=<dir> env vars
  2. After service ready, CLI takes baseline snapshot (all coverable lines for denominator)
  3. After each test, CLI takes per-test snapshot (delta since last reset)
  4. Per-test coverage uploaded to backend via TraceTestCoverageData on TraceTestResult
  5. Baseline + startup coverage uploaded via CoverageBaseline on UpdateDriftRunCIStatusRequest
  6. Backend computes aggregate from IN_SUITE tests post-promotion

Config options

coverage:
  enabled: true                    # auto-collect during validation runs
  include: ["backend/src/**"]      # restrict to service code (monorepos)
  exclude: ["**/migrations/**"]    # exclude noise
  strip_path_prefix: "/app"        # Docker: strip container mount point

Key features

  • Baseline + per-test snapshot architecture with V8 counter auto-reset
  • Branch coverage with union semantics (sum covered, clamp to total)
  • LCOV and JSON export formats
  • Include/exclude glob filtering with doublestar
  • In-suite-only filtering for validation runs (draft tests excluded from output)
  • TUI integration: baseline capture, aggregate in service logs, sorted per-test display
  • Docker Compose support via strip_path_prefix
  • Backend upload: per-test coverage piggybacked on test result upload, baseline on CI status update
  • Startup coverage included in aggregate (matches industry standard)
  • CLI and backend produce identical coverage numbers

TODOs before merge

Edge cases / gotchas

  • NODE_V8_COVERAGE is inherited by all child processes. SDK quick-scans to skip non-server files.
  • Stale dist/ artifacts from tsc cause import errors. Use rm -rf dist before build.
  • Docker paths are container-absolute. strip_path_prefix strips the mount point.
  • Coverage callback runs BEFORE upload callback so per-test data is available in the proto.
  • GetCoverageBaselineForUpload merges baseline + per-test records for complete denominator.

Adds code coverage collection during test execution:

- --coverage flag on `tusk run` enables coverage mode
- CLI injects NODE_V8_COVERAGE and TUSK_COVERAGE_PORT env vars
- Takes V8 coverage snapshots between tests via SDK HTTP endpoint
- Processes raw V8 files with embedded Node.js helper script
- Runs c8 report for aggregate Istanbul JSON
- Diffs consecutive V8 snapshots for per-test coverage (marginal/new lines)
- Outputs per-file and per-test coverage summary
- Shows coverage sub-lines in --print mode
- Shows per-file coverage breakdown in TUI test log panel
- Cleans up raw V8 files after processing (~40KB output vs ~24MB raw)

Known limitation: per-test coverage shows marginal (new) lines only due to
V8's binary best-effort coverage mode. Will be addressed by switching to
V8 Inspector precise coverage with reset between tests.
V8 Inspector precise coverage doesn't work for per-test tracking because
takePreciseCoverage only returns scripts loaded AFTER startPreciseCoverage
is called. Since user code loads at startup before SDK init, the Inspector
misses all user scripts.

NODE_V8_COVERAGE best-effort coverage is binary (0/1 counts), so per-test
diffing gives marginal coverage (newly covered lines only).

Changes:
- Reverted to v8.takeCoverage() + Node.js helper for per-test processing
- Added per-file coverage breakdown in TUI test log panel
- Added coverage sub-lines in --print mode
- Stored per-test coverage diffs on executor for TUI access
Major simplification of coverage implementation:
- Remove NYC command wrapping (resolveNpmScript, etc.)
- Just set NODE_V8_COVERAGE + TUSK_COVERAGE_PORT env vars
- v8.takeCoverage() auto-resets counters -> each snapshot is clean per-test data
- No diffing needed (was needed for NYC cumulative approach)
- Aggregate computed by merging all per-test snapshots
- Remove process-v8-coverage.js helper script (SDK handles V8 processing)
- Remove NYC-specific code (command resolution, Istanbul JSON parsing)
- Works with any start command (npm, yarn, docker, shell scripts)
CLI calls /snapshot?baseline=true at startup to capture all coverable lines
(including uncovered at count=0). This baseline is used as the starting
point for aggregate coverage, providing an accurate denominator.

Added TakeCoverageBaseline() and mergeWithBaseline() to separate
baseline (denominator) from per-test (delta) snapshots.
- Replace fmt.Sscanf with strconv.Atoi (idiomatic Go)
- Language-agnostic comments (remove Node-specific references)
- Add 22 unit tests for coverage pure functions:
  sanitizeFileName, dedup, LinecountsToCoverageDetail, mergeWithBaseline
The baseline snapshot (?baseline=true) was only taken in the single-env
fallback path but tests run through ReplayTestsByEnvironment. Added
baseline capture after StartEnvironment() in environment_replay.go.

Also removed debug prints from coverage.go.
- Force concurrency=1 when --coverage is enabled (per-test snapshots
  require serial execution)
- Replace hardcoded 500ms sleep with retry loop (15 attempts, 200ms apart)
  for baseline snapshot - handles slow SDK startup
- Normalize absolute file paths to repo-relative (using cwd as base)
  so paths are consistent across machines for backend storage
- Accumulate baselines across environment groups (merge, don't overwrite)
  so coverage is correct when service restarts between groups
Use git rev-parse --show-toplevel as the base for relative paths instead
of cwd. This correctly handles monorepo files outside the service
directory (e.g., ../shared/utils.js becomes shared/utils.js).
Falls back to cwd if not in a git repo.
Paths outside the git root are kept absolute.
Move git root detection from onboard-cloud/helpers.go to utils/filesystem.go
as GetGitRootDir(). Reuse in coverage.go for file path normalization.
Removes duplicate exec.Command("git", "rev-parse", "--show-toplevel") calls.
- Remove per-test coverage.json file writing
- Remove summary.json file writing
- Remove coverageOutputDir from executor (no longer needed)
- Use os.TempDir() for NODE_V8_COVERAGE instead of .tusk/coverage-*/.v8-raw/
- No files left in user's project after coverage run
- All data stays in memory: printed to console, ready for backend upload
- New types: BranchInfo, FileCoverageData, CoverageSnapshot
- Parse branch data from SDK response (totalBranches, coveredBranches, per-line detail)
- Display branch coverage in aggregate: "85.5% lines, 93.3% branches"
- Merge branch data across baseline and per-test snapshots
- Update CoverageFileDiff, CoverageFileSummary, CoverageAggregate with branch fields
- Update all tests for new type structure
- Fix shallow copy bug in mergeWithBaseline: baseline branches map was
  shared, causing mutation of baseline data during merge
- Fix branch merge to use UNION semantics (sum covered, clamp to total)
  instead of max. Test A covers path 1, test B covers path 2 = both covered.
- Fix shallow copy in SnapshotToCoverageDetail branches
- Extract retry constants (coverageBaselineMaxRetries, coverageBaselineRetryDelay)
- Remove dead code: sanitizeFileName() and its tests (from file-output era)
- Extract ComputeCoverageSummary() as pure testable function (no I/O)
- printCoverageSummary() now just formats and prints
- Add tests for ComputeCoverageSummary (empty, percentages, per-file,
  branches, per-test)
- Add tests for branch union semantics and baseline immutability
- Add tests for normalizeCoveragePaths edge cases
- Remove unused os/filepath import from print function
- Add overflow guard in branch coverage accumulation (clamp + negative check)
- Log debug warning for invalid line numbers instead of silent skip
Add docs/drift/coverage.md with CLI flags, config options, output formats,
Docker Compose setup, and limitations. Update configuration.md with coverage
section (enabled, include, exclude, strip_path_prefix).
Coverage activation:
- coverage.enabled config option for automatic collection during validation
- --show-coverage flag replaces --coverage (display only, for local dev)
- Config-driven mode is silent (no console output), for CI upload
- coverage.strip_path_prefix for Docker container path normalization

Code quality:
- Add coverageBaselineMu mutex to prevent races
- Recompute branch totals in SetCoverageBaseline from merged per-line data
- Clean up V8 temp directory in StopService
- Return map copy from GetTestCoverageDetail (prevent concurrent access)
- Deduplicate formatCoverageSummary (shared by print and TUI)
- Simplify printCoverageSummary (no error return)
- Remove redundant CoverageFileExport type (use FileCoverageData with JSON tags)
- Filter per-test data by include/exclude in JSON export
- Filter to in-suite records for aggregate (drafts excluded from output)
- Use doublestar for glob matching (replace hand-rolled implementation)
- Fix TUI path shortening (only Rel() on absolute paths)
- Sort files alphabetically, use DisplayName for GraphQL tests
- Remove duplicate coverageRecords (single source in executor)
- Add strip_path_prefix tests
- Filter draft tests from JSON export per-test data (not just aggregate)
- Add filterInSuiteRecords to FormatCoverageSummaryLines (TUI consistency)
- Call ProcessCoverage in TUI mode so --coverage-output writes file
- Suppress double coverage display in TUI (FormatCoverageSummaryLines + ProcessCoverage)
- Show "Coverage written to" message in TUI service logs
- Upload per-test coverage via TraceTestCoverageData on TraceTestResult
- Upload baseline via CoverageBaseline on UpdateDriftRunCIStatusRequest
- Send startup-covered lines for consistent aggregate computation
- Callback reorder: snapshot before upload so data is available
- GetCoverageBaselineForUpload merges baseline + per-test for full denominator
- Use LineRange proto for compact range representation
- Increase coverage snapshot timeout to 60s
- Baseline counts set to original values (startup lines count as covered)
…acking-poc

# Conflicts:
#	cmd/run.go
#	go.mod
#	go.sum
…HA, deduplicate aggregate

- Branch merging uses max(covered) instead of sum+clamp to prevent
  inflating coverage when multiple tests cover the same branches
- Populate commitSha variable for validation runs so coverage baseline
  proto gets the correct commit SHA
- FormatCoverageSummaryLines returns computed aggregate for reuse by
  ProcessCoverageWithAggregate, avoiding redundant computation in TUI
- Revert branch merging from max to sum+clamp (matches Istanbul/NYC approach)
- Add WriteCoverageLCOV tests (format validation, empty, sorted files)
- Add WriteCoverageJSON tests (structure validation, empty)
…load

GetCoverageBaselineForUpload now returns both the merged snapshot (for
coverable lines denominator) and the original baseline (for startup
coverage). buildCoverageBaselineProto uses each for its proper purpose,
so StartupCoveredLinesByFile only contains lines covered during module
loading, not lines covered by test execution.
@sohil-kshirsagar sohil-kshirsagar changed the title WIP feat(coverage): Add code coverage collection and export feat(coverage): Add code coverage collection and export Apr 7, 2026
@sohil-kshirsagar sohil-kshirsagar marked this pull request as ready for review April 7, 2026 02:39
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6 issues found across 16 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="internal/runner/server.go">

<violation number="1" location="internal/runner/server.go:1015">
P1: `SendCoverageSnapshot` registers the pending response after sending the request, which can drop fast SDK responses and cause intermittent timeouts.</violation>
</file>

<file name="internal/runner/coverage.go">

<violation number="1" location="internal/runner/coverage.go:201">
P3: `e.coveragePerTest` is accessed without holding `coveragePerTestMu` here (and similarly in `printCoverageSummary` and `FormatCoverageSummaryLines`), while `SetTestCoverageDetail` and `GetTestCoverageDetail` properly lock the mutex. Currently safe because coverage forces concurrency to 1 and these reads occur after all tests complete, but the inconsistent locking would produce a data race under the Go race detector if concurrency assumptions ever change. Lock (or copy under lock) before passing the map to `WriteCoverageJSON`.</violation>

<violation number="2" location="internal/runner/coverage.go:516">
P2: Include/exclude coverage globs won't match on Windows because `doublestar.Match` expects `/`-separated paths.</violation>

<violation number="3" location="internal/runner/coverage.go:541">
P2: Compute the JSON summary after filtering per-test files; `summary.per_test` can currently disagree with the exported aggregate and `per_test` sections.</violation>
</file>

<file name="internal/tui/test_executor.go">

<violation number="1" location="internal/tui/test_executor.go:768">
P2: Only log "Coverage written" when coverage processing succeeds; currently it logs success even after an error.</violation>
</file>

<file name="internal/runner/executor.go">

<violation number="1" location="internal/runner/executor.go:581">
P2: Branch baseline merge ignores higher covered counts when totals are equal, which can undercount branch coverage across environment-group baseline snapshots.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.

…y, Windows paths, TUI error log

- Protect coveragePerTest reads with mutex via GetCoveragePerTestSnapshot()
- Register pending response channel before sending coverage snapshot request
- Compute JSON summary from filtered per-test data (after include/exclude)
- Normalize backslash paths for Windows glob matching
- Only log "Coverage written" on success
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 5821c20. Configure here.

@sohil-kshirsagar sohil-kshirsagar requested a review from jy-tan April 7, 2026 20:24
Copy link
Copy Markdown
Contributor

@jy-tan jy-tan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should WriteCoverageJSON(), printCoverageSummary(), and FormatCoverageSummaryLines() call a shared helper for consistency?

something like BuildCoverageReportView(rawSession, reportOptions) that applies suite filtering and include/exclude filtering exactly once.

type CoverageReportView struct {
    Records   []CoverageTestRecord
    Aggregate CoverageSnapshot
    PerTest   map[string]map[string]CoverageFileDiff
    Summary   CoverageSummary
}
  • printCoverageSummary() uses view.Summary
  • FormatCoverageSummaryLines() uses view.Summary
  • WriteCoverageJSON() writes view.Aggregate, view.PerTest, and view.Summary

…ithout baseline, CoverageReportView

- Use context.WithDeadline for baseline retry instead of manual time check
- Reduce max retries from 15 to 4 (baseline succeeds on first attempt in practice)
- Skip aggregate upload when baseline is nil to avoid misleading 100% coverage
- Add CoverageReportView struct and BuildCoverageReportView builder to centralize
  the filterInSuiteRecords → mergeWithBaseline → filterByPatterns → ComputeSummary chain
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="internal/runner/executor.go">

<violation number="1" location="internal/runner/executor.go:531">
P2: This early return drops valid baseline-only coverage. When baseline exists but no per-test records are present, aggregate coverage should still be uploaded from baseline instead of returning nil.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.

@sohil-kshirsagar sohil-kshirsagar merged commit 523ec80 into main Apr 8, 2026
14 checks passed
@sohil-kshirsagar sohil-kshirsagar deleted the feat/code-coverage-tracking-poc branch April 8, 2026 07:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants