Skip to content

Invalid characters in XML files #4852

@Pizzabroodje

Description

@Pizzabroodje

Describe the bug
It's possible for invalid characters to be put into XML files by bridges, like U+0003 in my case.

To Reproduce
In my case, it happened with the MarktplaatsBridge.
Sometimes, the feed gives an error in TT-RSS. The last two times I went to debug this, and I found out that when it broke there was a ETX character (so U+0003) at the end of a string somewhere. In both cases, the text being pulled by the bridge had an apostrophe in this place.

Expected behavior
Removing invalid characters from strings that are put into XML files

Additional context
A regex seems the best solution for this. In my personal version of the bridge, I have done the following to solve it:
preg_replace('/[^\PC\s]/u', '', $string);
This should remove a lot of characters that can break XML files.

It might be overkill if it only breaks on this ETX character at the end of a string, but I think there could be more cases where invalid characters break feeds.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Bug-ReportConfirmed bug report

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions