Proposal: automated PR validation checks for common XDM contribution mistakes

## Context

Companion issue to #2147 (CLAUDE.md). CLAUDE.md documents reviewer-enforced conventions so contributors (human and AI-assisted) can self-correct before opening a PR. This issue proposes **programmatic** enforcement so contributors don't have to read a doc to get the PR right — CI blocks the PR or the bot comments the fix directly.

CircleCI today runs `npm run lint`, `mocha test`, `npm run xed-validation`, and `npm run incompatibility-check`. It does **not** run `npm run validate` in CI, and most of the repeat reviewer feedback patterns below are not enforced anywhere.

## Proposed checks

Ordered roughly by value-to-effort ratio. Each item is independently shippable.

### 1. Run `npm run validate` in CI

`bin/validate-schemas.js` catches example files that reference fields not defined in the schema, enum values not in the `enum` array, type mismatches, etc. Running it in CI (gated on "zero new failures vs `master` baseline" — the `master` baseline has ~58 pre-existing unrelated failures) would catch the single most common PR regression.

Effort: add `- run: npm run validate` to `.circleci/config.yml`. The script already exits non-zero on failure. Baseline-diff logic can be added as a follow-up.

### 2. `meta:enum` coverage check

Lint rule: for every schema under `components/` / `schemas/` / `extensions/`, every value listed in an `enum` array must appear as a key in the sibling `meta:enum` object. Missing keys fail CI with a file-and-path pointer.

Covers the #1 reviewer feedback ("forgot the `meta:enum` key").

Effort: small standalone Node script, callable from `npm` scripts. ~50 LOC with AJV's schema walker.

### 3. Example-schema field-drift check

Stricter version of `npm test`: for each `*.example.*.json`, assert every `xdm:`-prefixed property path resolves to a definition somewhere in the merged schema (including inherited `allOf` chains). Today this is caught implicitly when the schema uses `additionalProperties: false`; many XDM schemas don't set that, so invented fields in examples pass mocha.

Effort: medium. Reuse `bin/validate-schemas.js` plumbing; add strict-mode wrapper.

### 4. `meta:status` presence check

Every new schema (added in the PR diff) must set `meta:status`. Today this is reviewer-enforced; a `git diff --name-only upstream/master...HEAD | grep '.schema.json$' | xargs jq ...` one-liner in CI would flag omissions.

Effort: small. Pure shell + `jq` in CI.

### 5. PR body / metadata checks (GitHub Action)

An Action that, on PR open / synchronize:
- Requires `Closes #<n>` or `Fixes #<n>` in the body
- Requires the `needs review` label before a reviewer is auto-requested
- Detects rename/remove of existing properties in a schema diff and posts a "consider deprecating instead" comment with a link to the deprecation pattern in CLAUDE.md
- Detects conversion of a plain `string` property to an `enum`-constrained property and posts a "consider using `meta:enum` on a soft enum instead" comment

Effort: medium, one reusable workflow.

### 6. Auto-fix bot (stretch)

For the subset of issues that have a deterministic fix (missing `meta:enum` keys for values that exist in `enum`, `xdm:` prefix missing on a property that the schema defines with the prefix, prettier reformatting), a GitHub Action bot that opens a fix-commit PR branch or leaves suggestions.

Effort: large; only worth doing after 1–5 are in place and we have usage data on the most common violations.

### 7. Reuse-over-reinvent hint (stretch)

Harder to automate, but a `suggest-existing-datatypes.js` helper that takes a new datatype file and surfaces the top-N semantically similar existing datatypes (by title + property-name Jaccard similarity) would save design-review round-trips. Could run as a PR comment, not a blocking check.

Effort: medium–large. Likely worth only after 1–5.

## Rollout

- Ship 1–4 behind a CI job that runs but does not block, for a 2-week observation window. Count true-positives vs false-positives.
- Promote to blocking once the false-positive rate is known and the baseline-diff UX is sound.
- 5–7 follow later as separate PRs.

## Non-goals

- Changing any existing contributor-facing workflows beyond adding checks.
- Replacing the XDM team's human review — these checks catch mechanical mistakes so reviewers can spend their time on design.

Happy to take any or all of these on as follow-up PRs once there's signal on which ones are most valuable to the team.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: automated PR validation checks for common XDM contribution mistakes #2148

Context

Proposed checks

1. Run `npm run validate` in CI

2. `meta:enum` coverage check

3. Example-schema field-drift check

4. `meta:status` presence check

5. PR body / metadata checks (GitHub Action)

6. Auto-fix bot (stretch)

7. Reuse-over-reinvent hint (stretch)

Rollout

Non-goals

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Proposal: automated PR validation checks for common XDM contribution mistakes #2148

Description

Context

Proposed checks

1. Run npm run validate in CI

2. meta:enum coverage check

3. Example-schema field-drift check

4. meta:status presence check

5. PR body / metadata checks (GitHub Action)

6. Auto-fix bot (stretch)

7. Reuse-over-reinvent hint (stretch)

Rollout

Non-goals

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

1. Run `npm run validate` in CI

2. `meta:enum` coverage check

4. `meta:status` presence check