-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
fix(es/ast): Fix unicode lone surrogates handling #10987
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
CodSpeed Performance ReportMerging #10987 will not alter performanceComparing Summary
Footnotes |
e32046f
to
5f04ddb
Compare
Thank you so much! Acutally I tried to fix this several times but it was very confusing :( |
All alerts resolved. Learn more about Socket for GitHub. This PR previously contained dependency changes with security issues that have been resolved, removed, or ignored. |
4d12831
to
df0b9ee
Compare
@kdy1, Hi!
EDIT: Added support for |
This comment was marked as abuse.
This comment was marked as abuse.
Do we really need to change AST? |
Actually changing AST is not allowed in our case because for v2 we are going to aligh the AST with babel or typescript-eslint |
Hi! @kdy1 For example, for
We could probably just change the way it serialize and deserialize to align the AST with them? Adding another layer seems necessary to me as we know there's some difference between Rust and JavaScript that I mentioned above. |
I'd appreciate a reference to Oxc because I assume the code is based on my comment #10978 (comment), it took us a lot of time to understand the problem and then make the right fix.
You probably want to bend the rule here, because the |
Will do! Really appreciate the job y'all have done ;-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.
@kdy1 Copilot seems broken with this huge code change ;-(. |
Description:
This PR fixed an issue related to lone surrogates handling in Rust.
This fix's credits all go to Oxc team #10978 (comment). What I'm doing is porting the fix that was made in Oxc and make it working under SWC.
Problem:
The problem is related to the fundamental difference between how Rust and JavaScript handle Unicode, especially lone surrogates.
JavaScript's Unicode Model
JavaScript uses UTF-16 internally and tolerates invalid Unicode sequences:
Rust's Unicode Model
Rust enforces strict Unicode validity:
Key Changes:
lone_surrogates: bool
field toStr
andTplElement
structs to track when strings contain lone surrogatesTODOs:
swc_estree_compat
binding
cratesBreaking changes:
Breaks the AST by adding
lone_surrogates
field toStr
andTplElement
and breaks thevalue
andcooked
respectly inStr
andTplElement
. Both of the field is using\u{FFFD}
(Replacement Character) as an escape iflone_surrogates
set totrue
.To consume the real value, you need to first check if
lone_surrogates
istrue
, then unescape it by removing the char and construct it with the four trailing hexs(from\u{FFFD}D800
to\uD800
).Related issue (if exists):
closes #10978
closes #10353
Fixed a regression of #7678