Skip to content

Document invariants of percent-encode sets better#896

Merged
annevk merged 3 commits intomainfrom
annevk/userinfo-backslash
Mar 31, 2026
Merged

Document invariants of percent-encode sets better#896
annevk merged 3 commits intomainfrom
annevk/userinfo-backslash

Conversation

@annevk
Copy link
Copy Markdown
Member

@annevk annevk commented Jan 24, 2026

Fixes #895.

Copy link
Copy Markdown

@ChALkeR ChALkeR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, %1B(J%5C%1B(B matches what I'm getting there
And the diff \ -> %5C matches what I expect as \ should be escaped

Thanks!

@ChALkeR
Copy link
Copy Markdown

ChALkeR commented Jan 24, 2026

Side note: having this as a test in WPT could be helpful to ensure that impls didn't use the wrong test
There are some tests for iso-2022-jp percent-encode, but not with %5C

Upd: ah, there is a full-range test for form/url, it just doesn't cover userinfo, but that's likely unreachable

Also give them a proper type.

Fixes #895.
@annevk annevk force-pushed the annevk/userinfo-backslash branch from 5c50135 to e52054e Compare January 24, 2026 11:19
@annevk annevk changed the title Correct example that did not encode backslash in userinfo Document invariants of percent-encode sets better Jan 24, 2026
@annevk
Copy link
Copy Markdown
Member Author

annevk commented Jan 24, 2026

That's a very good point. Which is probably also why I got the example wrong in the first place. I revised this PR with a more drastic change.

@annevk annevk requested review from ChALkeR, domenic and rmisev and removed request for ChALkeR January 24, 2026 11:23
Comment thread url.bs Outdated
Comment thread url.bs
U+005B ([) to U+005D (]), inclusive, and U+007C (|).
<p>The <dfn oldids=userinfo-encode-set>userinfo percent-encode set</dfn> is a
<a>percent-encode set</a> consisting of the <a>path percent-encode set</a> and U+002F (/),
U+003A (:), U+003B (;), U+003D (=), U+0040 (@), U+005B ([) to U+005D (]), inclusive, and U+007C (|).
Copy link
Copy Markdown

@ChALkeR ChALkeR Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this could have been explicit "5B, 5C, 5D" instead of "5B to 5D, inclusive"
that would make missing 5C less likely

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to keep this as-is for now as this is pre-existing and I think it's slightly more consistent. I could see making a new rule though that if a a range is 3 or less, you can't use the range syntax.

Comment thread url.bs
<p>The <dfn export>component percent-encode set</dfn> is the <a>userinfo percent-encode set</a> and
U+0024 ($) to U+0026 (&amp;), inclusive, U+002B (+), and U+002C (,).
<p>The <dfn export>component percent-encode set</dfn> is a <a>percent-encode set</a> consisting of
the <a>userinfo percent-encode set</a> and U+0024 ($) to U+0026 (&amp;), inclusive, U+002B (+), and
Copy link
Copy Markdown

@ChALkeR ChALkeR Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: same here, could make missing 25 less likely

Comment thread url.bs
inclusive, and U+007E (~).
<p>The <dfn><code>application/x-www-form-urlencoded</code> percent-encode set</dfn> is a
<a>percent-encode set</a> consisting of the <a>component percent-encode set</a> and U+0021 (!),
U+0027 (') to U+0029 RIGHT PARENTHESIS, inclusive, and U+007E (~).
Copy link
Copy Markdown

@ChALkeR ChALkeR Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: same here

Copy link
Copy Markdown

@ChALkeR ChALkeR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Examples LGTM, algo changes too

@ChALkeR
Copy link
Copy Markdown

ChALkeR commented Feb 1, 2026

Side note: I added more cross-tests
Firefox, Chromium, Servo seem to pass, WebKit fails on a number of encodings.

@annevk annevk merged commit dc66b3b into main Mar 31, 2026
2 checks passed
@annevk annevk deleted the annevk/userinfo-backslash branch March 31, 2026 13:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

Percent encoding examples seem to be wrong

2 participants