lib: map Latin1 labels to iso-8859-1 instead of Windows-1252 #58890

lytovka · 2025-06-29T19:54:34Z

This PR updates all Latin1 labels to point to the iso-8859-1 encoding instead of Windows-1252. The iso-8859-1 encoding will now use the decodeLatin1 fast path when calling the decode method. The Windows-1252 encoding will not trigger the decodeLatin1 fast path; instead, it will follow the standard path for obtaining the converter from the simdutf library.

A new test file has been added to verify the decoded Unicode values of bytes 0x7F-0x9F when the Windows-1252 encoding is selected.

NB: Fixing Latin1 label mappings will cause unexpected behavior if TextDecoder is called with any Latin1 label and attempts to decode bytes in the 0x80-0x9F range, since decoding for any of these labels will now follow the iso-8859-1 encoding.

Refs:

ISO-8859-1 code page layout: https://en.wikipedia.org/wiki/ISO/IEC_8859-1#Code_page_layout
Windows-1252 code page layout: https://en.wikipedia.org/wiki/Windows-1252#Codepage_layout

Latin1 is incorrectly mapped to the Windows-1252 encoding, which defines mappings for bytes 0x80–0x9F, unlike Latin1 (ISO-8859-1), where these bytes are control characters. Fixing this discrepancy can cause unexpected behavior if TextDecoder is called with any latin1 label and attempts to decode bytes in the 0x80–0x9F range, since the decoding will now follow ISO-8859-1 encoding. Fixes: nodejs#56542

nodejs-github-bot · 2025-06-29T19:54:39Z

Review requested:

@nodejs/web-standards

jasnell · 2025-06-29T20:56:45Z

I do have some slight concerns over whether this may be considered a breaking change. My preference would be to handle it as a bug fix, however. I'd like some feedback from @nodejs/tsc

lytovka · 2025-06-29T21:28:37Z

I do have some slight concerns over whether this may be considered a breaking change. My preference would be to handle it as a bug fix, however. I'd like some feedback from @nodejs/tsc

I had similar thoughts. I've updated the PR description with a note explaining why this could be considered a breaking change:

NB: Fixing Latin1 label mappings will cause unexpected behavior if TextDecoder is called with any Latin1 label and attempts to decode bytes in the 0x80-0x9F range, since decoding for any of these labels will now follow the iso-8859-1 encoding.

codecov · 2025-06-30T09:52:47Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 90.10%. Comparing base (4c65776) to head (43fbb04).
Report is 85 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #58890      +/-   ##
==========================================
- Coverage   90.10%   90.10%   -0.01%     
==========================================
  Files         640      640              
  Lines      188431   188427       -4     
  Branches    36956    36959       +3     
==========================================
- Hits       169783   169776       -7     
+ Misses      11362    11359       -3     
- Partials     7286     7292       +6

Files with missing lines	Coverage Δ
lib/internal/encoding.js	`98.87% <100.00%> (-0.65%)`	⬇️

... and 46 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

targos · 2025-06-30T10:04:53Z

I think we should land as a bug fix.

lytovka · 2025-06-30T13:05:54Z

Hi all! Just to confirm - aside from addressing the linting errors (43fbb04), is there anything else you'd like me to do before this can land? Thanks!

lytovka · 2025-07-13T15:57:40Z

@jasnell @targos @nodejs/tsc – bumping this up, let me know if there's anything missing. Thank you in advance!

nodejs-github-bot · 2025-07-15T23:36:32Z

CI: https://ci.nodejs.org/job/node-test-pull-request/67956/

lytovka added 4 commits June 29, 2025 14:21

test: fix label assertions in existing tests

1bf1c92

test: decode 0x7F-0x9F bytes with windows-1252 encoding

8819222

test: revert removal of windows-1254 labels

2813ea4

nodejs-github-bot added encoding Issues and PRs related to the TextEncoder and TextDecoder APIs. needs-ci PRs that need a full CI run. labels Jun 29, 2025

lytovka marked this pull request as ready for review June 29, 2025 19:56

jasnell approved these changes Jun 29, 2025

View reviewed changes

lytovka added 2 commits June 29, 2025 16:08

test: remove unnecessary console.log

eff9039

test: remove unnecessary comments

7143420

test: fix linting errors

43fbb04

jasnell approved these changes Jul 13, 2025

View reviewed changes

jasnell added the request-ci Add this label to start a Jenkins CI on a PR. label Jul 13, 2025

lpinca approved these changes Jul 15, 2025

View reviewed changes

github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Jul 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

lib: map Latin1 labels to iso-8859-1 instead of Windows-1252 #58890

lib: map Latin1 labels to iso-8859-1 instead of Windows-1252 #58890

Uh oh!

lytovka commented Jun 29, 2025 •

edited

Loading

Uh oh!

nodejs-github-bot commented Jun 29, 2025

Uh oh!

jasnell commented Jun 29, 2025

Uh oh!

lytovka commented Jun 29, 2025

Uh oh!

codecov bot commented Jun 30, 2025 •

edited

Loading

Uh oh!

targos commented Jun 30, 2025

Uh oh!

lytovka commented Jun 30, 2025

Uh oh!

lytovka commented Jul 13, 2025

Uh oh!

nodejs-github-bot commented Jul 15, 2025

Uh oh!

Uh oh!

Uh oh!

lib: map Latin1 labels to iso-8859-1 instead of Windows-1252 #58890

Are you sure you want to change the base?

lib: map Latin1 labels to iso-8859-1 instead of Windows-1252 #58890

Uh oh!

Conversation

lytovka commented Jun 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nodejs-github-bot commented Jun 29, 2025

Uh oh!

jasnell commented Jun 29, 2025

Uh oh!

lytovka commented Jun 29, 2025

Uh oh!

codecov bot commented Jun 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

targos commented Jun 30, 2025

Uh oh!

lytovka commented Jun 30, 2025

Uh oh!

lytovka commented Jul 13, 2025

Uh oh!

nodejs-github-bot commented Jul 15, 2025

Uh oh!

Uh oh!

lytovka commented Jun 29, 2025 •

edited

Loading

codecov bot commented Jun 30, 2025 •

edited

Loading