Add source-aware Markdown patching#275
Merged
Merged
Conversation
48ba55f to
b099679
Compare
b099679 to
0f68c8c
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What it does
Adds
patch_markdown()that preserves unchanged Markdown source bytes instead of reserializing the whole document. If a user changes one paragraph, surrounding Markdown keeps its original spelling:After editing only the second paragraph:
That preservation applies to common source details such as reference-style links, setext headings, table padding/alignment, raw HTML blocks, code-fence style, CRLF separators, blank-line trivia, and missing final newlines.
Usage:
Rationale
MarkdownProduceris allowed to emit canonical Markdown. That is fine for exports, but it is noisy for file-backed editing: a one-word edit should not rewrite__bold__to**bold**, setext headings to ATX headings,~~~fences to backticks, or carefully padded tables.This PR gives editor integrations a way to keep using the existing
MarkdownConsumerandMarkdownProducerwhile only changing the parts of the Markdown file that correspond to edited blocks.Implementation
Adds two classes:
MarkdownSourceDocument: owns the original Markdown, generated block markup, metadata, source units, and patching algorithm.MarkdownSourceUnit: stores one original Markdown source slice, byte offsets, corresponding block markup, and semantic hash.MarkdownSourceDocument::from_markdown()parses Markdown two ways:MarkdownConsumerproduces WordPress block markup.When those views map one-to-one, each top-level Markdown block becomes a source unit. When they do not, the document falls back to one conservative whole-document unit; unchanged saves still preserve the original Markdown byte-for-byte.
patch_markdown()then:parse_blocks().MarkdownProducer.This PR also adjusts two vendored CommonMark cursor caches used by inline code and angle-braced link parsing. They still use
WeakReferencewhere available, but fall back to a direct cursor reference on PHP 7.2/7.3 so the current CI matrix can parse those Markdown constructs.Testing instructions
Focused checks:
Broader check:
vendor/bin/phpunit --testsuite 'Project Test Suite' -c phpunit.xml composer lintCurrent local results:
MarkdownSourceDocumentTest: 101 tests, 508 assertions.The source-aware tests cover tiny trivia boundaries, pairwise before/after source preservation, large mixed documents, changed/inserted/deleted/reordered blocks, duplicate blocks, CRLF frontmatter, reference-style links, setext headings, fences, indented code, nested blockquotes, ordered-list numbering, raw HTML, thematic breaks, tables, and fallback preservation for currently unmapped constructs.
I also reproduced the PHP 7.2
WeakReferencefailure in Docker and verified inline code plus angle-braced links parse after the vendored CommonMark cache fix.