Skip to content

Add source-aware Markdown patching#275

Merged
adamziel merged 1 commit into
trunkfrom
adamziel/source-aware-markdown-patches
May 17, 2026
Merged

Add source-aware Markdown patching#275
adamziel merged 1 commit into
trunkfrom
adamziel/source-aware-markdown-patches

Conversation

@adamziel
Copy link
Copy Markdown
Collaborator

@adamziel adamziel commented May 16, 2026

What it does

Adds patch_markdown() that preserves unchanged Markdown source bytes instead of reserializing the whole document. If a user changes one paragraph, surrounding Markdown keeps its original spelling:

Keep __this__ syntax.

Change this sentence.

After editing only the second paragraph:

Keep __this__ syntax.

Change this edited sentence.

That preservation applies to common source details such as reference-style links, setext headings, table padding/alignment, raw HTML blocks, code-fence style, CRLF separators, blank-line trivia, and missing final newlines.

Usage:

$document = MarkdownSourceDocument::from_markdown( $markdown );
$blocks   = $document->get_block_markup();

// Edit $blocks in a block editor.

$updated_markdown = $document->patch_markdown( $edited_blocks );

Rationale

MarkdownProducer is allowed to emit canonical Markdown. That is fine for exports, but it is noisy for file-backed editing: a one-word edit should not rewrite __bold__ to **bold**, setext headings to ATX headings, ~~~ fences to backticks, or carefully padded tables.

This PR gives editor integrations a way to keep using the existing MarkdownConsumer and MarkdownProducer while only changing the parts of the Markdown file that correspond to edited blocks.

Implementation

Adds two classes:

  • MarkdownSourceDocument: owns the original Markdown, generated block markup, metadata, source units, and patching algorithm.
  • MarkdownSourceUnit: stores one original Markdown source slice, byte offsets, corresponding block markup, and semantic hash.

MarkdownSourceDocument::from_markdown() parses Markdown two ways:

  1. MarkdownConsumer produces WordPress block markup.
  2. CommonMark/GFM/frontmatter parsing provides top-level Markdown source positions.

When those views map one-to-one, each top-level Markdown block becomes a source unit. When they do not, the document falls back to one conservative whole-document unit; unchanged saves still preserve the original Markdown byte-for-byte.

patch_markdown() then:

  1. Parses edited block markup with parse_blocks().
  2. Computes semantic hashes for original source units and edited blocks.
  3. Uses LCS matching to keep the longest set of unchanged blocks, including repeated identical blocks.
  4. Copies unchanged source units verbatim.
  5. Serializes changed/inserted blocks with MarkdownProducer.
  6. Reuses line-oriented trivia from replaced blocks, so CRLFs and final-newline behavior survive edits.

This PR also adjusts two vendored CommonMark cursor caches used by inline code and angle-braced link parsing. They still use WeakReference where available, but fall back to a direct cursor reference on PHP 7.2/7.3 so the current CI matrix can parse those Markdown constructs.

Testing instructions

Focused checks:

vendor/bin/phpunit components/Markdown/Tests/MarkdownSourceDocumentTest.php
vendor/bin/phpunit --filter 'Markdown(Consumer|Producer|SourceDocument)Test' -c phpunit.xml
php vendor/bin/phpcs -d memory_limit=1G . -n

Broader check:

vendor/bin/phpunit --testsuite 'Project Test Suite' -c phpunit.xml
composer lint

Current local results:

  • MarkdownSourceDocumentTest: 101 tests, 508 assertions.
  • Markdown consumer/producer/source filter: 137 tests, 544 assertions.
  • Project test suite: 5,289 tests, 16,514 assertions, 2,027 skipped.

The source-aware tests cover tiny trivia boundaries, pairwise before/after source preservation, large mixed documents, changed/inserted/deleted/reordered blocks, duplicate blocks, CRLF frontmatter, reference-style links, setext headings, fences, indented code, nested blockquotes, ordered-list numbering, raw HTML, thematic breaks, tables, and fallback preservation for currently unmapped constructs.

I also reproduced the PHP 7.2 WeakReference failure in Docker and verified inline code plus angle-braced links parse after the vendored CommonMark cache fix.

@adamziel adamziel force-pushed the adamziel/source-aware-markdown-patches branch 5 times, most recently from 48ba55f to b099679 Compare May 16, 2026 23:25
@adamziel adamziel force-pushed the adamziel/source-aware-markdown-patches branch from b099679 to 0f68c8c Compare May 16, 2026 23:39
@adamziel adamziel added the enhancement New feature or request label May 16, 2026
@adamziel adamziel merged commit 96a3a7a into trunk May 17, 2026
79 of 83 checks passed
@adamziel adamziel deleted the adamziel/source-aware-markdown-patches branch May 17, 2026 00:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant