Skip to content

pybind11 raises a UnicodeDecodeError on non-utf bytes in terms of sort Bytes #1078

Open
@gtrepta

Description

@gtrepta

Terms of sort Bytes and String are both stored in a kore_string_pattern in the AST library and treated the same way when being accessed from the bindings:

py::class_<kore_string_pattern, std::shared_ptr<kore_string_pattern>>(
ast, "StringPattern", pattern_base)
.def(py::init(&kore_string_pattern::create))
.def_property_readonly("contents", &kore_string_pattern::get_contents);

The issue here is when the contents property is accessed, pybind assumes it's a valid utf encoded string. This isn't always the case for Bytes terms, though, and an exception gets thrown in that case.

Pybind does support returning an unconverted string, so we should find out how to do that for terms that need to be treated that way.

https://pybind11.readthedocs.io/en/stable/advanced/cast/strings.html#returning-c-strings-to-python

Metadata

Metadata

Assignees

Labels

bindingsLLVM backend bindings to other languages

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions