Skip to content

lib: map Latin1 labels to iso-8859-1 instead of Windows-1252 #58890

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

lytovka
Copy link

@lytovka lytovka commented Jun 29, 2025

Fixes: #56542

This PR updates all Latin1 labels to point to the iso-8859-1 encoding instead of Windows-1252. The iso-8859-1 encoding will now use the decodeLatin1 fast path when calling the decode method. The Windows-1252 encoding will not trigger the decodeLatin1 fast path; instead, it will follow the standard path for obtaining the converter from the simdutf library.

A new test file has been added to verify the decoded Unicode values of bytes 0x7F-0x9F when the Windows-1252 encoding is selected.

NB: Fixing Latin1 label mappings will cause unexpected behavior if TextDecoder is called with any Latin1 label and attempts to decode bytes in the 0x80-0x9F range, since decoding for any of these labels will now follow the iso-8859-1 encoding.

Refs:

lytovka added 4 commits June 29, 2025 14:21
Latin1 is incorrectly mapped to the Windows-1252 encoding,
which defines mappings for bytes 0x80–0x9F, unlike Latin1 (ISO-8859-1),
where these bytes are control characters. Fixing this discrepancy can
cause unexpected behavior if TextDecoder is called with any
latin1 label and attempts to decode bytes in the 0x80–0x9F range,
since the decoding will now follow ISO-8859-1 encoding.

Fixes: nodejs#56542
@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/web-standards

@nodejs-github-bot nodejs-github-bot added encoding Issues and PRs related to the TextEncoder and TextDecoder APIs. needs-ci PRs that need a full CI run. labels Jun 29, 2025
@lytovka lytovka marked this pull request as ready for review June 29, 2025 19:56
@jasnell
Copy link
Member

jasnell commented Jun 29, 2025

I do have some slight concerns over whether this may be considered a breaking change. My preference would be to handle it as a bug fix, however. I'd like some feedback from @nodejs/tsc

@lytovka
Copy link
Author

lytovka commented Jun 29, 2025

I do have some slight concerns over whether this may be considered a breaking change. My preference would be to handle it as a bug fix, however. I'd like some feedback from @nodejs/tsc

I had similar thoughts. I've updated the PR description with a note explaining why this could be considered a breaking change:

NB: Fixing Latin1 label mappings will cause unexpected behavior if TextDecoder is called with any Latin1 label and attempts to decode bytes in the 0x80-0x9F range, since decoding for any of these labels will now follow the iso-8859-1 encoding.

Copy link

codecov bot commented Jun 30, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 90.10%. Comparing base (4c65776) to head (43fbb04).
Report is 85 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #58890      +/-   ##
==========================================
- Coverage   90.10%   90.10%   -0.01%     
==========================================
  Files         640      640              
  Lines      188431   188427       -4     
  Branches    36956    36959       +3     
==========================================
- Hits       169783   169776       -7     
+ Misses      11362    11359       -3     
- Partials     7286     7292       +6     
Files with missing lines Coverage Δ
lib/internal/encoding.js 98.87% <100.00%> (-0.65%) ⬇️

... and 46 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@targos
Copy link
Member

targos commented Jun 30, 2025

I think we should land as a bug fix.

@lytovka
Copy link
Author

lytovka commented Jun 30, 2025

Hi all! Just to confirm - aside from addressing the linting errors (43fbb04), is there anything else you'd like me to do before this can land? Thanks!

@lytovka
Copy link
Author

lytovka commented Jul 13, 2025

@jasnell @targos @nodejs/tsc – bumping this up, let me know if there's anything missing. Thank you in advance!

@jasnell jasnell added the request-ci Add this label to start a Jenkins CI on a PR. label Jul 13, 2025
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Jul 15, 2025
@nodejs-github-bot
Copy link
Collaborator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
encoding Issues and PRs related to the TextEncoder and TextDecoder APIs. needs-ci PRs that need a full CI run.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TextDecoder incorrectly decodes 0x92 and several other characters for Windows-1252
5 participants