Fix identifier (un)escaping by JanJakes · Pull Request #47 · Automattic/sqlite-database-integration

JanJakes · 2025-05-06T14:46:07Z

The translate_string_literal logic in the SQLite driver that handles "unescaping" (interpreting) of MySQL escape sequences is only applied to a textStringLiteral AST node. It turns out, the same logic is needed in translate_pure_identifier, or, in other words, also for WP_MySQL_Lexer::DOUBLE_QUOTED_TEXT and WP_MySQL_Lexer::BACK_TICK_QUOTED_ID tokens.

Therefore, it's probably better to move this logic to the tokenization phase and make token instances return the correct unquoted token values.

Surfaced in #42.

JanJakes · 2025-05-06T14:49:06Z

+	 *
+	 * @return string The token value.
+	 */
+	public function get_value(): string {


While I like that using get_value() is lazy and can generally nicely work for any token type where we need to interpret or normalize any values, I'm wondering how to solve the NO_BACKSLASH_ESCAPES SQL mode.

It's a very simple IF, but in the token instance, we just know nothing about SQL modes 🤔 The tokenizer knows it, so it could pass in a flag, or use a different token instance, but that makes it a bit less elegant.

Could it be a constructor argument? The mode is already determined when the token is created. If that was a boolean flag baked into the Token instance, we could still keep the get_value() method argument-less.

👍 Done in 6e5a8f5.

adamziel · 2025-05-06T15:33:43Z

+				 *   > of pattern-matching contexts, they evaluate to the strings \% and
+				 *   > \_, not to % and _.
+				 */
+				'\%'   => '\\\\%',


What do you think about using '\\%' instead of \% just to make sure PHP won't surprise us by treating the backslash as an escape sequence? Or, alternatively, $backslash . '%' or '\x5C%'?

I thought maybe we're good since PHP docs says this:

To specify a literal single quote, escape it with a backslash (\). To specify a literal backslash, double it (\\). All other instances of backslash will be treated as a literal backslash

But this gives me trust issues:

// This backslash is not treated as a literal backslash: php > var_dump("\x5C"); string(1) "\" // Neither is this: php > var_dump("\n"); string(1) " "

Ah no, that was just me using the wrong quotes. We are good indeed:

php > var_dump('\x5C'); string(4) "\x5C" php > var_dump('\n'); string(2) "\n"

We're good here, but it's a good point to be a bit more explicit. I don't like the maps like '\n' => "\n", where at a quick sight, the left and right sides look the same.

I followed the $baskslash idea in 931c82b, and I think it looks better now.

adamziel · 2025-05-06T15:41:04Z

+			/*
+			 * Apply the replacements.
+			 *
+			 * It is important to use "strtr()" and not "str_replace()", because


Such a brilliant find ❤️

adamziel · 2025-05-06T15:49:02Z

+			 * A backslash with any other character represents the character itself.
+			 * That is, \x evaluates to x, \\ evaluates to \, and \🙂 evaluates to 🙂.
+			 */
+			$value = preg_replace( '/\\\\(.)/u', '$1', $value );


Now I feel nostalgic. The first time I've read how a quadruple backslash evaluates to a single backslash in preg_* functions was about 20 years ago in a PHP4 book. I'm old. 👴 Can we either document this or express this in a different way? Perhaps $escaped_backslash = preg_quote("\x5C"); and preg_replace( '/'. $escaped_backslash .'(.)/u', '$1', $value );? Maybe that's an overkill. Feel free to make any call here, I just want to make sure this gets brought up.

Good point! I did that together with the other improvements in 931c82b.

adamziel

Great work, thank you Jan! I'm approving provisionally – I'd still like to see a rigorous test case that directly targets the quote_mysql_utf8_string_literal method. I know it's private. Perhaps it's generic and useful enough to be exposed publicly? And if not, there are ways to test private methods, too (although protected ones are easier).

JanJakes · 2025-05-07T15:03:54Z

I'm approving provisionally – I'd still like to see a rigorous test case that directly targets the quote_mysql_utf8_string_literal method.

@adamziel I added a test in 5196a05. Are there any more cases we should cover there?

Otherwise, this should be ready now.

adamziel · 2025-05-07T15:37:22Z

I left a nitpick about a comment, but the substance of the PR looks great. Thank you for additional tests!

JanJakes · 2025-05-07T19:32:02Z

@adamziel I improved the invalid UTF-8 tests and docs: 20f82be

I hope it makes more sense now.

JanJakes added 3 commits May 6, 2025 16:37

Improve MySQL string unquoting an move it to the tokenizer level

d9fa2b2

Fix default value formatting in SHOW CREATE TABLE, improve tests

7e972f1

Support table, column, and index comments, and test encoding

240bbf8

JanJakes commented May 6, 2025

View reviewed changes

JanJakes requested a review from adamziel May 6, 2025 14:52

adamziel reviewed May 6, 2025

View reviewed changes

Comment thread wp-includes/sqlite-ast/class-wp-sqlite-information-schema-reconstructor.php Outdated

adamziel approved these changes May 6, 2025

View reviewed changes

JanJakes added 3 commits May 7, 2025 10:01

Improve escaping clarity and docs

931c82b

Implement support for NO_BACKSLASH_ESCAPES SQL mode

6e5a8f5

Add a test for quote_mysql_utf8_string_literal()

5196a05

JanJakes requested a review from adamziel May 7, 2025 15:03

Improve invalid UTF-8 test cases and their docs

20f82be

JanJakes force-pushed the string-escaping branch from a6b20c5 to 20f82be Compare May 7, 2025 19:31

adamziel merged commit 278d41c into develop May 7, 2025
12 checks passed

JanJakes deleted the string-escaping branch May 8, 2025 05:05

Conversation

JanJakes commented May 6, 2025

Uh oh!

JanJakes May 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adamziel May 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JanJakes May 7, 2025

Choose a reason for hiding this comment

Uh oh!

adamziel May 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adamziel May 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adamziel May 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JanJakes May 7, 2025

Choose a reason for hiding this comment

Uh oh!

adamziel May 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adamziel May 6, 2025

Choose a reason for hiding this comment

Uh oh!

JanJakes May 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

adamziel left a comment

Choose a reason for hiding this comment

Uh oh!

JanJakes commented May 7, 2025

Uh oh!

adamziel commented May 7, 2025

Uh oh!

JanJakes commented May 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JanJakes May 6, 2025 •

edited

Loading

adamziel May 6, 2025 •

edited

Loading

adamziel May 6, 2025 •

edited

Loading

adamziel May 6, 2025 •

edited

Loading

adamziel May 6, 2025 •

edited

Loading

adamziel May 6, 2025 •

edited

Loading