Skip to content

Proposal: automated PR validation checks for common XDM contribution mistakes #2148

@ejsuncy

Description

@ejsuncy

Context

Companion issue to #2147 (CLAUDE.md). CLAUDE.md documents reviewer-enforced conventions so contributors (human and AI-assisted) can self-correct before opening a PR. This issue proposes programmatic enforcement so contributors don't have to read a doc to get the PR right — CI blocks the PR or the bot comments the fix directly.

CircleCI today runs npm run lint, mocha test, npm run xed-validation, and npm run incompatibility-check. It does not run npm run validate in CI, and most of the repeat reviewer feedback patterns below are not enforced anywhere.

Proposed checks

Ordered roughly by value-to-effort ratio. Each item is independently shippable.

1. Run npm run validate in CI

bin/validate-schemas.js catches example files that reference fields not defined in the schema, enum values not in the enum array, type mismatches, etc. Running it in CI (gated on "zero new failures vs master baseline" — the master baseline has ~58 pre-existing unrelated failures) would catch the single most common PR regression.

Effort: add - run: npm run validate to .circleci/config.yml. The script already exits non-zero on failure. Baseline-diff logic can be added as a follow-up.

2. meta:enum coverage check

Lint rule: for every schema under components/ / schemas/ / extensions/, every value listed in an enum array must appear as a key in the sibling meta:enum object. Missing keys fail CI with a file-and-path pointer.

Covers the #1 reviewer feedback ("forgot the meta:enum key").

Effort: small standalone Node script, callable from npm scripts. ~50 LOC with AJV's schema walker.

3. Example-schema field-drift check

Stricter version of npm test: for each *.example.*.json, assert every xdm:-prefixed property path resolves to a definition somewhere in the merged schema (including inherited allOf chains). Today this is caught implicitly when the schema uses additionalProperties: false; many XDM schemas don't set that, so invented fields in examples pass mocha.

Effort: medium. Reuse bin/validate-schemas.js plumbing; add strict-mode wrapper.

4. meta:status presence check

Every new schema (added in the PR diff) must set meta:status. Today this is reviewer-enforced; a git diff --name-only upstream/master...HEAD | grep '.schema.json$' | xargs jq ... one-liner in CI would flag omissions.

Effort: small. Pure shell + jq in CI.

5. PR body / metadata checks (GitHub Action)

An Action that, on PR open / synchronize:

  • Requires Closes #<n> or Fixes #<n> in the body
  • Requires the needs review label before a reviewer is auto-requested
  • Detects rename/remove of existing properties in a schema diff and posts a "consider deprecating instead" comment with a link to the deprecation pattern in CLAUDE.md
  • Detects conversion of a plain string property to an enum-constrained property and posts a "consider using meta:enum on a soft enum instead" comment

Effort: medium, one reusable workflow.

6. Auto-fix bot (stretch)

For the subset of issues that have a deterministic fix (missing meta:enum keys for values that exist in enum, xdm: prefix missing on a property that the schema defines with the prefix, prettier reformatting), a GitHub Action bot that opens a fix-commit PR branch or leaves suggestions.

Effort: large; only worth doing after 1–5 are in place and we have usage data on the most common violations.

7. Reuse-over-reinvent hint (stretch)

Harder to automate, but a suggest-existing-datatypes.js helper that takes a new datatype file and surfaces the top-N semantically similar existing datatypes (by title + property-name Jaccard similarity) would save design-review round-trips. Could run as a PR comment, not a blocking check.

Effort: medium–large. Likely worth only after 1–5.

Rollout

  • Ship 1–4 behind a CI job that runs but does not block, for a 2-week observation window. Count true-positives vs false-positives.
  • Promote to blocking once the false-positive rate is known and the baseline-diff UX is sound.
  • 5–7 follow later as separate PRs.

Non-goals

  • Changing any existing contributor-facing workflows beyond adding checks.
  • Replacing the XDM team's human review — these checks catch mechanical mistakes so reviewers can spend their time on design.

Happy to take any or all of these on as follow-up PRs once there's signal on which ones are most valuable to the team.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions