-
-
Notifications
You must be signed in to change notification settings - Fork 32.2k
lib: map Latin1 labels to iso-8859-1 instead of Windows-1252 #58890
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Latin1 is incorrectly mapped to the Windows-1252 encoding, which defines mappings for bytes 0x80–0x9F, unlike Latin1 (ISO-8859-1), where these bytes are control characters. Fixing this discrepancy can cause unexpected behavior if TextDecoder is called with any latin1 label and attempts to decode bytes in the 0x80–0x9F range, since the decoding will now follow ISO-8859-1 encoding. Fixes: nodejs#56542
Review requested:
|
I do have some slight concerns over whether this may be considered a breaking change. My preference would be to handle it as a bug fix, however. I'd like some feedback from @nodejs/tsc |
I had similar thoughts. I've updated the PR description with a note explaining why this could be considered a breaking change:
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #58890 +/- ##
==========================================
- Coverage 90.10% 90.10% -0.01%
==========================================
Files 640 640
Lines 188431 188427 -4
Branches 36956 36959 +3
==========================================
- Hits 169783 169776 -7
+ Misses 11362 11359 -3
- Partials 7286 7292 +6
🚀 New features to boost your workflow:
|
I think we should land as a bug fix. |
Hi all! Just to confirm - aside from addressing the linting errors ( |
Fixes: #56542
This PR updates all Latin1 labels to point to the
iso-8859-1
encoding instead ofWindows-1252
. Theiso-8859-1
encoding will now use thedecodeLatin1
fast path when calling the decode method. TheWindows-1252
encoding will not trigger thedecodeLatin1
fast path; instead, it will follow the standard path for obtaining the converter from thesimdutf
library.A new test file has been added to verify the decoded Unicode values of bytes
0x7F
-0x9F
when theWindows-1252
encoding is selected.NB: Fixing Latin1 label mappings will cause unexpected behavior if
TextDecoder
is called with any Latin1 label and attempts to decode bytes in the0x80
-0x9F
range, since decoding for any of these labels will now follow theiso-8859-1
encoding.Refs: