Skip to content

CI: emit synthetic JUnit XML when rake task fails before tests run#5502

Draft
p-datadog wants to merge 1 commit intomasterfrom
ci-synthetic-junit-on-rake-failure
Draft

CI: emit synthetic JUnit XML when rake task fails before tests run#5502
p-datadog wants to merge 1 commit intomasterfrom
ci-synthetic-junit-on-rake-failure

Conversation

@p-datadog
Copy link
Copy Markdown
Member

@p-datadog p-datadog commented Mar 25, 2026

What does this PR do?

Wraps the sh call in run_batch_tests with a rescue block that writes a synthetic
JUnit XML when a rake task fails before RSpec starts. The existing artifact upload step
picks it up — no changes needed to the upload or the dd/junit merge job.

The original exception is re-raised, so the job still fails as before. This only adds
visibility.

Motivation:

I noticed that when a rake task in a CI batch fails before RSpec runs (wrong task name,
LoadError, syntax error), the JUnit artifacts from that batch only contain results from
the tasks that succeeded. The failing task produces no XML at all.

What caught my eye is that this makes the failure invisible to anything that relies on
JUnit — test failure analysis, Datadog CI Visibility — since the artifacts exist but
show 0 failures. The only way to find the actual error is log parsing.

Ran into this on PR #5111 where 8 jobs failed with "Don't know how to build task
`spec:di_with_ext`" but JUnit showed all green. The synthetic XML would have surfaced
the rake error directly in the structured data.

Change log entry

None

Additional Notes:

Not yet verified locally — `bundle exec rake standard` requires full gem setup.
CI will validate.

Worth noting: the synthetic XML uses the same directory (`tmp/rspec/`) and naming
convention as real JUnit output, so it flows through the existing pipeline without
any special handling.

How to test the change?

  • Verify synthetic XML is well-formed
  • Verify existing tests still pass (no behavioral change on success path)
  • To test the failure path: temporarily change a Matrixfile entry to a nonexistent
    task, push, and confirm the synthetic XML appears in artifacts
  • Confirm `datadog-ci junit upload` accepts the synthetic XML format

When a rake task fails before RSpec starts (wrong task name, missing
gem, syntax error), no JUnit XML is produced for that task. CI
artifacts then contain only results from other tasks in the batch,
making the failure invisible through the normal JUnit pipeline.

Write a synthetic JUnit XML with the rake error message so the failure
shows up in JUnit artifact analysis without log parsing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 25, 2026

Thank you for updating Change log entry section 👏

Visited at: 2026-03-25 04:45:24 UTC

@p-datadog p-datadog added the AI Generated Largely based on code generated by an AI or LLM. This label is the same across all dd-trace-* repos label Mar 25, 2026
@datadog-official
Copy link
Copy Markdown

datadog-official bot commented Mar 25, 2026

✅ Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

🎯 Code Coverage (details)
Patch Coverage: 100.00%
Overall Coverage: 95.16% (-0.02%)

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 44c14b8 | Docs | Datadog PR Page | Was this helpful? React with 👍/👎 or give us feedback!

@pr-commenter
Copy link
Copy Markdown

pr-commenter bot commented Mar 25, 2026

Benchmarks

Benchmark execution time: 2026-03-25 05:20:43

Comparing candidate commit 44c14b8 in PR branch ci-synthetic-junit-on-rake-failure with baseline commit d170f39 in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 46 metrics, 0 unstable metrics.

Explanation

This is an A/B test comparing a candidate commit's performance against that of a baseline commit. Performance changes are noted in the tables below as:

  • 🟩 = significantly better candidate vs. baseline
  • 🟥 = significantly worse candidate vs. baseline

We compute a confidence interval (CI) over the relative difference of means between metrics from the candidate and baseline commits, considering the baseline as the reference.

If the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD), the change is considered significant.

Feel free to reach out to #apm-benchmarking-platform on Slack if you have any questions.

More details about the CI and significant changes

You can imagine this CI as a range of values that is likely to contain the true difference of means between the candidate and baseline commits.

CIs of the difference of means are often centered around 0%, because often changes are not that big:

---------------------------------(------|---^--------)-------------------------------->
                              -0.6%    0%  0.3%     +1.2%
                                 |          |        |
         lower bound of the CI --'          |        |
sample mean (center of the CI) -------------'        |
         upper bound of the CI ----------------------'

As described above, a change is considered significant if the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD).

For instance, for an execution time metric, this confidence interval indicates a significantly worse performance:

----------------------------------------|---------|---(---------^---------)---------->
                                       0%        1%  1.3%      2.2%      3.1%
                                                  |   |         |         |
       significant impact threshold --------------'   |         |         |
                      lower bound of CI --------------'         |         |
       sample mean (center of the CI) --------------------------'         |
                      upper bound of CI ----------------------------------'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI Generated Largely based on code generated by an AI or LLM. This label is the same across all dd-trace-* repos

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants