Fix Golden Corpus Determinism test to hash validation output instead of pytest metadata #34

Copilot · 2026-01-04T19:10:41Z

Summary: Fix Golden Corpus Determinism Test ✅

This PR fixes the Golden Corpus Determinism test in .github/workflows/governance-invariants.yml which was failing on all active PRs because it was hashing pytest's complete output including non-deterministic timing metadata.

Changes Completed

Technical Details

Problem: The workflow was hashing raw pytest stdout/stderr which included:

Test execution times (e.g., 0.06s vs 0.05s) - non-deterministic
Pytest timing metadata - non-deterministic
Platform/environment info - non-deterministic

Additional Issue Fixed: YAML syntax error with multiline Python scripts using unclosed quotes (python -c "...). Fixed by using bash heredoc syntax (python << 'EOF'...EOF).

Solution:

Use pytest-json-report to generate structured JSON output
Extract only deterministic fields: test summary, node IDs, outcomes, and error details
Hash the cleaned JSON instead of raw pytest output
Use heredoc syntax for multiline Python scripts in YAML

Verification:

Local testing: 3 consecutive runs produce identical SHA256 hash ec7841ee23732277ad2fee1d0ff8fc907060b5c960c6954450c95bf1a87f0ae4
All 42 repository tests pass
No security vulnerabilities detected
Heredoc syntax validated and working correctly

Impact

This fix will:

✅ Unblock all active PRs (Add reusable workflow and contract for mirror implementation conformance #30, Add FM lifecycle metadata, comprehensive mitigation table, and audit-of-audits governance #31, Implement automated semantic drift detection for golden corpus stability #32)
✅ Provide true determinism validation for the golden corpus
✅ Make the governance invariant test meaningful and reliable
✅ Enable proper corpus validation across CI runs
✅ Fix YAML syntax errors in the workflow

Original prompt

This section details on the original issue you should resolve

<issue_title>Fix Golden Corpus Determinism test to hash validation output instead of pytest metadata</issue_title>
<issue_description>## Problem

The Golden Corpus Determinism test in .github/workflows/governance-invariants.yml is currently failing on all active PRs (#30, #31, #32) because it hashes the entire pytest output instead of the actual corpus validation results.

Current Behavior

The workflow runs corpus tests 3 times and compares SHA256 hashes of the complete pytest stdout/stderr:
pytest tests/test_corpus.py -v --tb=short > /tmp/corpus_run1.txt 2>&1
This captures:

✅ Validation results (what we want)

❌ Test execution times (non-deterministic)

❌ Pytest timing metadata (non-deterministic)

❌ Timestamps in logs (non-deterministic)

Evidence

All three runs produce different hashes:
Run 1 hash: eef68dcf1f1d8573f1823e568...
Run 2 hash: c5d9b9d347e10585866789604...
Run 3 hash: c5d9b9d347e10585866789604... (matches Run 2)
Solution

We need to hash only the corpus validation output, not the pytest metadata.

Recommended Approach: Use pytest-json-report

Step 1: Add pytest-json-report to test dependencies in pyproject.toml:
[project.optional-dependencies]
test = [
    "pytest>=8.3.4",
    "pytest-json-report>=1.5.0",  # Add this
]
Step 2: Update .github/workflows/governance-invariants.yml:
- name: Run corpus tests (iteration 1)
  env:
    BASE120_FIXED_TIMESTAMP: "2026-01-01T00:00:00.000000Z"
  run: |
    pytest tests/test_corpus.py --json-report --json-report-file=/tmp/corpus_run1.json
    # Extract only test outcomes and results, strip timing/metadata
    python -c "
import json
with open('/tmp/corpus_run1.json') as f:
    data = json.load(f)
# Extract deterministic fields only
results = {
    'summary': data['summary'],
    'tests': [
        {
            'nodeid': t['nodeid'],
            'outcome': t['outcome'],
            'call': t.get('call', {})
        }
        for t in data['tests']
    ]
}
with open('/tmp/corpus_run1_clean.json', 'w') as f:
    json.dump(results, f, sort_keys=True, indent=2)
" | cat

- name: Run corpus tests (iteration 2)
  env:
    BASE120_FIXED_TIMESTAMP: "2026-01-01T00:00:00.000000Z"
  run: |
    pytest tests/test_corpus.py --json-report --json-report-file=/tmp/corpus_run2.json
    python -c "[same script as above]" | cat

- name: Run corpus tests (iteration 3)
  env:
    BASE120_FIXED_TIMESTAMP: "2026-01-01T00:00:00.000000Z"
  run: |
    pytest tests/test_corpus.py --json-report --json-report-file=/tmp/corpus_run3.json
    python -c "[same script as above]" | cat

- name: Compare output hashes
  run: |
    HASH1=$(sha256sum /tmp/corpus_run1_clean.json | cut -d' ' -f1)
    HASH2=$(sha256sum /tmp/corpus_run2_clean.json | cut -d' ' -f1)
    HASH3=$(sha256sum /tmp/corpus_run3_clean.json | cut -d' ' -f1)
    
    echo "Run 1 hash: $HASH1"
    echo "Run 2 hash: $HASH2"
    echo "Run 3 hash: $HASH3"
    
    if [ "$HASH1" != "$HASH2" ] || [ "$HASH1" != "$HASH3" ]; then
      echo "❌ FAIL: Corpus outputs differ across runs"
      echo "This violates Invariant 1: Golden Corpus Determinism"
      exit 1
    fi
    
    echo "✅ PASS: All corpus runs produced identical output"
    echo "Golden Corpus Determinism verified."
Alternative Approach (Simpler)

If you prefer not to add pytest-json-report, suppress pytest's timing output:
- name: Run corpus tests (iteration 1)
  env:
    BASE120_FIXED_TIMESTAMP: "2026-01-01T00:00:00.000000Z"
  run: |
    pytest tests/test_corpus.py -v --tb=line --no-header --no-summary -q > /tmp/corpus_run1.txt 2>&1 || true
However, this is less robust as it still captures some pytest metadata.

Impact

Fixing this will:

✅ Unblock all three active PRs (Add reusable workflow and contract for mirror implementation conformance #30, Add FM lifecycle metadata, comprehensive mitigation table, and audit-of-audits governance #31, Implement automated semantic drift detection for golden corpus stability #32)

✅ Provide true determinism validation

✅ Make the governance invariant test meaningful

✅ Enable proper corpus validation across runs

Related

Blocks: Add reusable workflow and contract for mirror implementation conformance #30, Add FM lifecycle metadata, comprehensive mitigation table, and audit-of-audits governance #31, Implement automated semantic drift detection for golden corpus stability #32

Component: CI/CD, Governance

Priority: High (blocking all PRs)</issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes Fix Golden Corpus Determinism test to hash validation output instead of pytest metadata #33

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

github-actions · 2026-01-04T19:12:05Z

🔍 Governance Classification

Change Class: trivial (Impact Level: 1)

Description: Minor corrections with zero semantic impact (typos, formatting)

Evidence Requirements

No evidence required

Review Requirements

Minimum Reviewers: 0 external + CODEOWNER approval
Audit Update Required: ❌ No

Next Steps

✅ Ensure documentation is clear and accurate
✅ Wait for required approvals (0 + CODEOWNER)
✅ Address any CI check failures

📚 See GOVERNANCE.md for detailed requirements.

Co-authored-by: hummbl-dev <238336761+hummbl-dev@users.noreply.github.com>

hummbl-dev · 2026-01-04T19:20:18Z

@copilot The workflow is failing with a YAML syntax error on line 36. The multi-line Python script isn't properly formatted for YAML.

The issue is that you can't have an unclosed quote spanning multiple lines like this:

python -c "
import json

Please fix this by using a YAML heredoc instead. Replace the Python command sections with this approach:

- name: Run corpus tests (iteration 1)
  env:
    BASE120_FIXED_TIMESTAMP: "2026-01-01T00:00:00.000000Z"
  run: |
    pytest tests/test_corpus.py --json-report --json-report-file=/tmp/corpus_run1.json
    python << 'EOF'
import json
with open('/tmp/corpus_run1.json') as f:
    data = json.load(f)
results = {
    'summary': data['summary'],
    'tests': [
        {
            'nodeid': t['nodeid'],
            'outcome': t['outcome'],
            'call': t.get('call', {}).get('longrepr', '') if t.get('call') else ''
        }
        for t in data['tests']
    ]
}
with open('/tmp/corpus_run1_clean.json', 'w') as f:
    json.dump(results, f, sort_keys=True, indent=2)
EOF
    echo "Run 1 completed"

Apply the same fix to iterations 2 and 3, changing the file names accordingly (corpus_run2.json, corpus_run3.json, etc.).

Co-authored-by: hummbl-dev <238336761+hummbl-dev@users.noreply.github.com>

Copilot · 2026-01-04T19:24:32Z

@copilot The workflow is failing with a YAML syntax error on line 36. The multi-line Python script isn't properly formatted for YAML.

The issue is that you can't have an unclosed quote spanning mul...

Fixed in commit 3a83dbe. Replaced python -c "..." with heredoc syntax python << 'EOF'...EOF for all three test iterations. Verified locally that the heredoc approach works correctly and produces deterministic output across multiple runs (all produce hash ec7841ee23732277ad2fee1d0ff8fc907060b5c960c6954450c95bf1a87f0ae4).

Co-authored-by: hummbl-dev <238336761+hummbl-dev@users.noreply.github.com>

Initial plan

3ca7169

Copilot AI assigned Copilot and hummbl-dev Jan 4, 2026

Copilot started work on behalf of hummbl-dev January 4, 2026 19:11 View session

Copilot AI and others added 2 commits January 4, 2026 19:14

Fix Golden Corpus Determinism test by using pytest-json-report

d3a91ac

Co-authored-by: hummbl-dev <238336761+hummbl-dev@users.noreply.github.com>

Simplify nested conditional in test extraction script

8b3836c

Co-authored-by: hummbl-dev <238336761+hummbl-dev@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Fix Golden Corpus Determinism test to hash validation output~~ Fix Golden Corpus Determinism test to hash validation output instead of pytest metadata Jan 4, 2026

Copilot AI requested a review from hummbl-dev January 4, 2026 19:18

Copilot finished work on behalf of hummbl-dev January 4, 2026 19:18

Copilot started work on behalf of hummbl-dev January 4, 2026 19:20 View session

Fix YAML syntax by using heredoc instead of multiline quoted string

3a83dbe

Co-authored-by: hummbl-dev <238336761+hummbl-dev@users.noreply.github.com>

Copilot finished work on behalf of hummbl-dev January 4, 2026 19:26

hummbl-dev marked this pull request as ready for review January 4, 2026 19:28

hummbl-dev merged commit 3fbe537 into main Jan 4, 2026
9 checks passed

hummbl-dev deleted the copilot/fix-golden-corpus-test branch January 4, 2026 19:28

This was referenced Jan 4, 2026

Add reusable workflow and contract for mirror implementation conformance #30

Merged

Implement automated semantic drift detection for golden corpus stability #32

Draft

Copilot AI added a commit that referenced this pull request Jan 4, 2026

Document rebase on latest main with PR #34 fix

c30e5a9

Co-authored-by: hummbl-dev <238336761+hummbl-dev@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix Golden Corpus Determinism test to hash validation output instead of pytest metadata #34

Fix Golden Corpus Determinism test to hash validation output instead of pytest metadata #34

Copilot AI commented Jan 4, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Jan 4, 2026 •

edited

Loading

Uh oh!

hummbl-dev commented Jan 4, 2026

Uh oh!

Copilot AI commented Jan 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix Golden Corpus Determinism test to hash validation output instead of pytest metadata #34

Fix Golden Corpus Determinism test to hash validation output instead of pytest metadata #34

Conversation

Copilot AI commented Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary: Fix Golden Corpus Determinism Test ✅

Changes Completed

Technical Details

Impact

Current Behavior

Evidence

Solution

Recommended Approach: Use pytest-json-report

Alternative Approach (Simpler)

Impact

Related

Comments on the Issue (you are @copilot in this section)

Uh oh!

github-actions bot commented Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔍 Governance Classification

Evidence Requirements

Review Requirements

Next Steps

Uh oh!

hummbl-dev commented Jan 4, 2026

Uh oh!

Copilot AI commented Jan 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Jan 4, 2026 •

edited

Loading

github-actions bot commented Jan 4, 2026 •

edited

Loading