-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Describe the bug
It's possible for invalid characters to be put into XML files by bridges, like U+0003 in my case.
To Reproduce
In my case, it happened with the MarktplaatsBridge.
Sometimes, the feed gives an error in TT-RSS. The last two times I went to debug this, and I found out that when it broke there was a ETX character (so U+0003) at the end of a string somewhere. In both cases, the text being pulled by the bridge had an apostrophe in this place.
Expected behavior
Removing invalid characters from strings that are put into XML files
Additional context
A regex seems the best solution for this. In my personal version of the bridge, I have done the following to solve it:
preg_replace('/[^\PC\s]/u', '', $string);
This should remove a lot of characters that can break XML files.
It might be overkill if it only breaks on this ETX character at the end of a string, but I think there could be more cases where invalid characters break feeds.