[Data Liberation] wp_rewrite_urls()#1893
Conversation
|
I thought it won't be ready for some more time but I today landed a comfortable enough amount of unit tests to merge this PR as v1 of |
A part of #1894. Follows up on #1893. This PR brings in a few more PHP APIs that were initially explored outside of Playground so that they can be incubated in Playground. See the linked descriptions for more details about each API: * XML Processor from WordPress/wordpress-develop#6713 * Stream chain from adamziel/wxr-normalize#1 * A draft of a WXR URL Rewriter class capable of rewriting URLs in WXR files ## Testing instructions * Confirm the PHPUnit tests pass in CI * Confirm the test suite looks reasonabel * That's it for now! It's all new code that's not actually used anywhere in Playground yet. I just want to merge it to keep iterating and improving.
brandonpayton
left a comment
There was a problem hiding this comment.
Hi, I am catching up on reviewing merged PRs and am leaving comments in case they are valuable.
| while ( true ) { | ||
| $this->block_attributes_iterator->next(); | ||
| if ( ! $this->block_attributes_iterator->valid() ) { | ||
| break; | ||
| } | ||
| return true; | ||
| } | ||
|
|
||
| return false; |
There was a problem hiding this comment.
What is the purpose of this while-true loop? It looks like we might be able to simplify this to:
$this->block_attributes_iterator->next();
if ( $this->block_attributes_iterator->valid() ) {
return true;
}
return false;| * base URL. | ||
| * When a base URL is missing, the string must start with a protocol to | ||
| * be considered a URL. | ||
| */ |
There was a problem hiding this comment.
I like this comment.
Thinking about it led to thinking about subdirectory-based multisites and this question:
Should we have any concern for cases where a subdir multisite is moved to a different base subdir. For example, if http://earth.com/old-multisite/<blog> is moved to http://moon.com/new-multisite/<blog>, would we want to handle rewriting /old-multisite to /new-multisite?
Such URLs may or may not include the hostname.
Maybe if we support path rewriting, it will need to be an optional rewrite feature, and maybe even a separate facility because, as this comment implies, it's conceivable that there may be false-positives.
There was a problem hiding this comment.
Good spot! The good news is path rewriting is supported :) not sure if in this PR, but for sure in trunk. It won't catch everything, e.g. host-less paths in text content, but it will catch a lot.
| $this->did_prepend_protocol = false; | ||
| while ( true ) { | ||
| /** | ||
| * Thick sieve – eagerly match things that look like URLs but turn out to not be URLs in the end. |
Motivation for the change, related issues
A part of #1894.
Prototypes a
wp_rewrite_urls()URL rewriter for block markup to migrate the content from, say,<a href="https://adamadam.blog">to<a href="https://adamziel.com/blog">.Status:
WP_HTML_Tag_ProcessorandWP_HTML_Processorto enable usage outside of WordPress core.Details
This PR consists of a code ported from https://github.com/adamziel/site-transfer-protocol. It uses a cascade of parsers to pierce through the structured data in a WordPress post and replace the URLs matching the requested domain.
The data flow is as follows:
Parse HTML -> Parse block comments -> Parse attributes JSON -> Parse URLs
On a high level, this parsing cascade is handled by the
WP_Block_Markup_Url_Processorclass:Getting more into details, the
WP_Block_Markup_Url_Processorextends theWP_HTML_Tag_Processorclass and walks the block markup token by token. It then drills down into:<a href="">, looking for ones that contain valid URLsThe
next_url()method moves through the stream of tokens, looking for the next match in one of the above contexts, and theset_raw_url()knows how to update each node type, e.g. block attributes updates arejson_encode()-d.Processing tricky inputs
When this code is fed into the migrator:
This actual output is produced:
Remaining work
composer install)Follow-up work
WP_HTML_Tag_Processorin WordPress core, see HTML API: Add set_modifiable_text() for replacing text nodes. wordpress-develop#7007 (comment)WP_HTML_Tag_Processoras a "WordPress polyfill" for standalone usage.Testing Instructions (or ideally a Blueprint)
CI runs the PHP unit tests. To run this on your local machine, do this: