Add support for non-native-endian UTF-16 and UTF-32

It would be great if the endianess of the input buffer could be changed for each `match()`.
For our use case we can have strings which come in little and big endian encoding and we must support both.
The encoding normalization to UTF-8 naturally eats a lot of runtime.

Having this build into PCRE2 would be a blessing.

I am aware that the docs say:

> UTF-16 and UTF-32 strings can indicate their endianness by special code knows as a byte-order mark (BOM).
> The PCRE2 functions do not handle this, expecting strings to be in host byte order.

But would it be a possible extension? Or is it simply utopic because too complicated to implement?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for non-native-endian UTF-16 and UTF-32 #763

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add support for non-native-endian UTF-16 and UTF-32 #763

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions