Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Jan 4, 2026

Summary: Fix Golden Corpus Determinism Test ✅

This PR fixes the Golden Corpus Determinism test in .github/workflows/governance-invariants.yml which was failing on all active PRs because it was hashing pytest's complete output including non-deterministic timing metadata.

Changes Completed

  • Add pytest-json-report to test dependencies in pyproject.toml
  • Update .github/workflows/governance-invariants.yml to:
    • Use pytest-json-report for deterministic test output
    • Extract only test outcomes and results (strip timing/metadata)
    • Compare hashes of cleaned JSON output instead of raw pytest output
    • Fix YAML syntax by using heredoc instead of multiline quoted strings
  • Test the changes locally to ensure determinism (verified: all 3 runs produce identical hashes)
  • Run code review and address feedback
  • Run security checks (CodeQL: no alerts)
  • Verify all repository tests pass (42 tests passed)

Technical Details

Problem: The workflow was hashing raw pytest stdout/stderr which included:

  • Test execution times (e.g., 0.06s vs 0.05s) - non-deterministic
  • Pytest timing metadata - non-deterministic
  • Platform/environment info - non-deterministic

Additional Issue Fixed: YAML syntax error with multiline Python scripts using unclosed quotes (python -c "...). Fixed by using bash heredoc syntax (python << 'EOF'...EOF).

Solution:

  1. Use pytest-json-report to generate structured JSON output
  2. Extract only deterministic fields: test summary, node IDs, outcomes, and error details
  3. Hash the cleaned JSON instead of raw pytest output
  4. Use heredoc syntax for multiline Python scripts in YAML

Verification:

  • Local testing: 3 consecutive runs produce identical SHA256 hash ec7841ee23732277ad2fee1d0ff8fc907060b5c960c6954450c95bf1a87f0ae4
  • All 42 repository tests pass
  • No security vulnerabilities detected
  • Heredoc syntax validated and working correctly

Impact

This fix will:

Original prompt

This section details on the original issue you should resolve

<issue_title>Fix Golden Corpus Determinism test to hash validation output instead of pytest metadata</issue_title>
<issue_description>## Problem

The Golden Corpus Determinism test in .github/workflows/governance-invariants.yml is currently failing on all active PRs (#30, #31, #32) because it hashes the entire pytest output instead of the actual corpus validation results.

Current Behavior

The workflow runs corpus tests 3 times and compares SHA256 hashes of the complete pytest stdout/stderr:

pytest tests/test_corpus.py -v --tb=short > /tmp/corpus_run1.txt 2>&1

This captures:

  • ✅ Validation results (what we want)
  • ❌ Test execution times (non-deterministic)
  • ❌ Pytest timing metadata (non-deterministic)
  • ❌ Timestamps in logs (non-deterministic)

Evidence

All three runs produce different hashes:

Run 1 hash: eef68dcf1f1d8573f1823e568...
Run 2 hash: c5d9b9d347e10585866789604...
Run 3 hash: c5d9b9d347e10585866789604... (matches Run 2)

Solution

We need to hash only the corpus validation output, not the pytest metadata.

Recommended Approach: Use pytest-json-report

Step 1: Add pytest-json-report to test dependencies in pyproject.toml:

[project.optional-dependencies]
test = [
    "pytest>=8.3.4",
    "pytest-json-report>=1.5.0",  # Add this
]

Step 2: Update .github/workflows/governance-invariants.yml:

- name: Run corpus tests (iteration 1)
  env:
    BASE120_FIXED_TIMESTAMP: "2026-01-01T00:00:00.000000Z"
  run: |
    pytest tests/test_corpus.py --json-report --json-report-file=/tmp/corpus_run1.json
    # Extract only test outcomes and results, strip timing/metadata
    python -c "
import json
with open('/tmp/corpus_run1.json') as f:
    data = json.load(f)
# Extract deterministic fields only
results = {
    'summary': data['summary'],
    'tests': [
        {
            'nodeid': t['nodeid'],
            'outcome': t['outcome'],
            'call': t.get('call', {})
        }
        for t in data['tests']
    ]
}
with open('/tmp/corpus_run1_clean.json', 'w') as f:
    json.dump(results, f, sort_keys=True, indent=2)
" | cat

- name: Run corpus tests (iteration 2)
  env:
    BASE120_FIXED_TIMESTAMP: "2026-01-01T00:00:00.000000Z"
  run: |
    pytest tests/test_corpus.py --json-report --json-report-file=/tmp/corpus_run2.json
    python -c "[same script as above]" | cat

- name: Run corpus tests (iteration 3)
  env:
    BASE120_FIXED_TIMESTAMP: "2026-01-01T00:00:00.000000Z"
  run: |
    pytest tests/test_corpus.py --json-report --json-report-file=/tmp/corpus_run3.json
    python -c "[same script as above]" | cat

- name: Compare output hashes
  run: |
    HASH1=$(sha256sum /tmp/corpus_run1_clean.json | cut -d' ' -f1)
    HASH2=$(sha256sum /tmp/corpus_run2_clean.json | cut -d' ' -f1)
    HASH3=$(sha256sum /tmp/corpus_run3_clean.json | cut -d' ' -f1)
    
    echo "Run 1 hash: $HASH1"
    echo "Run 2 hash: $HASH2"
    echo "Run 3 hash: $HASH3"
    
    if [ "$HASH1" != "$HASH2" ] || [ "$HASH1" != "$HASH3" ]; then
      echo "❌ FAIL: Corpus outputs differ across runs"
      echo "This violates Invariant 1: Golden Corpus Determinism"
      exit 1
    fi
    
    echo "✅ PASS: All corpus runs produced identical output"
    echo "Golden Corpus Determinism verified."

Alternative Approach (Simpler)

If you prefer not to add pytest-json-report, suppress pytest's timing output:

- name: Run corpus tests (iteration 1)
  env:
    BASE120_FIXED_TIMESTAMP: "2026-01-01T00:00:00.000000Z"
  run: |
    pytest tests/test_corpus.py -v --tb=line --no-header --no-summary -q > /tmp/corpus_run1.txt 2>&1 || true

However, this is less robust as it still captures some pytest metadata.

Impact

Fixing this will:

Related

Comments on the Issue (you are @copilot in this section)


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

@github-actions
Copy link

github-actions bot commented Jan 4, 2026

🔍 Governance Classification

Change Class: trivial (Impact Level: 1)

Description: Minor corrections with zero semantic impact (typos, formatting)

Evidence Requirements

  • No evidence required

Review Requirements

  • Minimum Reviewers: 0 external + CODEOWNER approval
  • Audit Update Required: ❌ No

Next Steps

  1. ✅ Ensure documentation is clear and accurate
  2. ✅ Wait for required approvals (0 + CODEOWNER)
  3. ✅ Address any CI check failures

📚 See GOVERNANCE.md for detailed requirements.

Copilot AI and others added 2 commits January 4, 2026 19:14
Co-authored-by: hummbl-dev <238336761+hummbl-dev@users.noreply.github.com>
Co-authored-by: hummbl-dev <238336761+hummbl-dev@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix Golden Corpus Determinism test to hash validation output Fix Golden Corpus Determinism test to hash validation output instead of pytest metadata Jan 4, 2026
Copilot AI requested a review from hummbl-dev January 4, 2026 19:18
@hummbl-dev
Copy link
Owner

@copilot The workflow is failing with a YAML syntax error on line 36. The multi-line Python script isn't properly formatted for YAML.

The issue is that you can't have an unclosed quote spanning multiple lines like this:

python -c "
import json

Please fix this by using a YAML heredoc instead. Replace the Python command sections with this approach:

- name: Run corpus tests (iteration 1)
  env:
    BASE120_FIXED_TIMESTAMP: "2026-01-01T00:00:00.000000Z"
  run: |
    pytest tests/test_corpus.py --json-report --json-report-file=/tmp/corpus_run1.json
    python << 'EOF'
import json
with open('/tmp/corpus_run1.json') as f:
    data = json.load(f)
results = {
    'summary': data['summary'],
    'tests': [
        {
            'nodeid': t['nodeid'],
            'outcome': t['outcome'],
            'call': t.get('call', {}).get('longrepr', '') if t.get('call') else ''
        }
        for t in data['tests']
    ]
}
with open('/tmp/corpus_run1_clean.json', 'w') as f:
    json.dump(results, f, sort_keys=True, indent=2)
EOF
    echo "Run 1 completed"

Apply the same fix to iterations 2 and 3, changing the file names accordingly (corpus_run2.json, corpus_run3.json, etc.).

Co-authored-by: hummbl-dev <238336761+hummbl-dev@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Jan 4, 2026

@copilot The workflow is failing with a YAML syntax error on line 36. The multi-line Python script isn't properly formatted for YAML.

The issue is that you can't have an unclosed quote spanning mul...

Fixed in commit 3a83dbe. Replaced python -c "..." with heredoc syntax python << 'EOF'...EOF for all three test iterations. Verified locally that the heredoc approach works correctly and produces deterministic output across multiple runs (all produce hash ec7841ee23732277ad2fee1d0ff8fc907060b5c960c6954450c95bf1a87f0ae4).

@hummbl-dev hummbl-dev marked this pull request as ready for review January 4, 2026 19:28
@hummbl-dev hummbl-dev merged commit 3fbe537 into main Jan 4, 2026
9 checks passed
@hummbl-dev hummbl-dev deleted the copilot/fix-golden-corpus-test branch January 4, 2026 19:28
Copilot AI added a commit that referenced this pull request Jan 4, 2026
Co-authored-by: hummbl-dev <238336761+hummbl-dev@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix Golden Corpus Determinism test to hash validation output instead of pytest metadata

2 participants