What is the issue with the Encoding Standard?
Same as #358 but for Unicode BOM
If the proposal of #358 is to reset state for errors, then what should happen to BOM seen?
I don't argue that it should be reset, but there is definitely some sort of issue and inconsistency there
Platform status is highly inconsistent:
const r = (d, ...a) => {
try {
return d.decode(...a).length
} catch {}
return 'e'
}
const a = new TextDecoder('utf8', { fatal: true })
console.log('A',
r(a, Uint8Array.of(0xef, 0xbb, 0xbf, 0xff), { stream: true }),
r(a, Uint8Array.of(0xef, 0xbb, 0xbf), { stream: true }), // error does not stick in Chrome/Safari
)
const b = new TextDecoder('utf8', { fatal: true })
console.log('B',
r(b, Uint8Array.of(0xef, 0xbb, 0xbf, 0xef), { stream: true }),
r(b, Uint8Array.of(0xef, 0xbb, 0xbf), { stream: true }),
r(b, Uint8Array.of(0xef, 0xbb, 0xbf), { stream: true }), // error sticks in Chrome / Safari
r(b, Uint8Array.of(0xbb, 0xbf), { stream: true }),
r(b, Uint8Array.of(), { stream: true }),
)
const c = new TextDecoder('utf8', { fatal: true })
console.log('C',
r(c, Uint8Array.of(0xef, 0xbb, 0xbf), { stream: true }),
r(c, Uint8Array.of(0xff), { stream: true }),
r(c, Uint8Array.of(0xef, 0xbb, 0xbf), { stream: true }),
)
// Bonus: if BOM is not reset, is it processed on errors?
const d = new TextDecoder('utf8', { fatal: true })
const e = new TextDecoder('utf8', { fatal: true })
console.log('D',
r(d, Uint8Array.of(0x20, 0xff), { stream: true }),
r(d, Uint8Array.of(0xef, 0xbb, 0xbf), { stream: true }),
r(d, Uint8Array.of(0xff), { stream: true }),
r(d, Uint8Array.of(0xef, 0xbb, 0xbf), { stream: true }),
r(e, Uint8Array.of(0xff), { stream: true }),
r(e, Uint8Array.of(0xef, 0xbb, 0xbf), { stream: true }),
)
Chrome: first error did not get stuck, second error got stuck
A e 0
B 0 e e e e
C 0 e 1
D e 0 e 1 e 0
WebKit: first error did not get stuck, second error got stuck
A e 1
B 0 e e e e
C 0 e 1
D e 1 e 1 e 0
Firefox, Servo, Deno, Static Hermes: errors do not stick, bom does not get reset, bom seen is set on errors
A e 1
B 0 e 1 e 0
C 0 e 1
D e 1 e 1 e 1
Node.js: errors do not stick
A e 0
B 0 e 1 e 0
C 0 e 1
D e 0 e 1 e 0
Bun: just broken
A e 0
B 0 0 0 1 0
C 0 e 0
D e 0 e 0 e 0
What is the issue with the Encoding Standard?
Same as #358 but for Unicode BOM
If the proposal of #358 is to reset state for errors, then what should happen to
BOM seen?I don't argue that it should be reset, but there is definitely some sort of issue and inconsistency there
Platform status is highly inconsistent:
Chrome: first error did not get stuck, second error got stuck
WebKit: first error did not get stuck, second error got stuck
Firefox, Servo, Deno, Static Hermes: errors do not stick, bom does not get reset, bom seen is set on errors
Node.js: errors do not stick
Bun: just broken