Skip to content

Character Encoding Issue with German Umlauts in HTML Body #481

@Mattel2000

Description

@Mattel2000

MsgReader library incorrectly decodes German umlauts (ä, ö, ü) in HTML Body, converting them to UTF-8 replacement characters.
In an old .NET Framework Solution everything is fine. The problem only exists in .NET Core and .Net 6+. The Reason is the different implementation of the default encoding.

In .NET Framework: Returns the encoding that corresponds to the system's active code page. This is the same encoding returned by GetEncoding(Int32) when called with a codepage argument of 0.

In .NET Core and later versions: Always returns a UTF8Encoding object. This behavior was changed to encourage the use of Unicode encodings for better cross-platform compatibility and data integrity.

(See page)

In my opinion, the problem is line "var bytes = Encoding.Default.GetBytes(s);" (Number: 1334) in the Property BodyHTML of class Message.cs.
.Net Framework solutions returns System.Text.SBCSCodePageEncoding, the same result of the property MessageCodePage. However in .Net the Result is always System.Text.UTF8Encoding, doesn't matter the coding of the text.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions