Skip to content

mcedit: Cannot enter certain non-printable Unicode characters #4878

@egmontkob

Description

@egmontkob

Is there an existing issue for this?

  • I have searched the existing issues

Midnight Commander version and build configuration

4.8.33, git

Operating system

Linux

Is this issue reproducible using the latest version of Midnight Commander?

  • I confirm the issue is still reproducible with the latest version of Midnight Commander

How to reproduce

mcedit, fully UTF-8 environment:

Try to enter a non-printable Unicode character such as U+FEFF, U+FFFE, U+FFFF, U+1FC00, U+10FFFE, U+10FFFF.

Typically you can enter these from the keyboard using Ctrl+Shift+U, then the hex code, then enter or space. This might depend on the OS, desktop, terminal emulator.

Alternatively, copy-paste from somewhere (e.g. a graphical text editor, web browser).

Expected behavior

The codepoint should be inserted to the file, and shown in the UI as one replacement symbol.

Actual behavior

All but the last byte of the UTF-8 sequence is inserted; in turn, showing up as multiple replacement symbols.

For example, U+FEFF "BOM" in UTF-8 is three bytes: ef bb bf. Instead, only the first two: ef bb is inserted. This shows up as two replacement symbols (dot on black background, or similar).

Similarly, the highest valid Unicode character U+10FFFF is four bytes: f4 8f bf bf. Instead, only the first three bytes f4 8f bf are inserted to the file, showing up as three replacement symbols.

Additional context

slang and ncurses builds are both affected, so the problem is probably not there.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area: mceditmcedit, the built-in text editorprio: mediumHas the potential to affect progress

    Type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions