-
-
Notifications
You must be signed in to change notification settings - Fork 64
Description
Is there an existing issue for this?
- I have searched the existing issues
Midnight Commander version and build configuration
4.8.33, gitOperating system
LinuxIs this issue reproducible using the latest version of Midnight Commander?
- I confirm the issue is still reproducible with the latest version of Midnight Commander
How to reproduce
mcedit, fully UTF-8 environment:
Try to enter a non-printable Unicode character such as U+FEFF, U+FFFE, U+FFFF, U+1FC00, U+10FFFE, U+10FFFF.
Typically you can enter these from the keyboard using Ctrl+Shift+U, then the hex code, then enter or space. This might depend on the OS, desktop, terminal emulator.
Alternatively, copy-paste from somewhere (e.g. a graphical text editor, web browser).
Expected behavior
The codepoint should be inserted to the file, and shown in the UI as one replacement symbol.
Actual behavior
All but the last byte of the UTF-8 sequence is inserted; in turn, showing up as multiple replacement symbols.
For example, U+FEFF "BOM" in UTF-8 is three bytes: ef bb bf. Instead, only the first two: ef bb is inserted. This shows up as two replacement symbols (dot on black background, or similar).
Similarly, the highest valid Unicode character U+10FFFF is four bytes: f4 8f bf bf. Instead, only the first three bytes f4 8f bf are inserted to the file, showing up as three replacement symbols.
Additional context
slang and ncurses builds are both affected, so the problem is probably not there.