Skip to content

Strip invalid char data from strings on save #329

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

hahn-kev
Copy link
Contributor

@hahn-kev hahn-kev commented Jun 17, 2025

relates to #328

I didn't see anything stripping out data, so I've gone ahead and done that via a compiled regex.


This change is Reviewable

Copy link

github-actions bot commented Jun 17, 2025

LCM Tests

    16 files  ± 0      16 suites  ±0   2m 52s ⏱️ -2s
 2 846 tests +12   2 826 ✅ +12   20 💤 ±0  0 ❌ ±0 
11 332 runs  +48  11 164 ✅ +48  168 💤 ±0  0 ❌ ±0 

Results for commit cbd23aa. ± Comparison against base commit 0eb28b5.

♻️ This comment has been updated with latest results.

@hahn-kev hahn-kev marked this pull request as ready for review June 18, 2025 02:38
Copy link
Contributor

@jasonleenaylor jasonleenaylor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 4 files at r2, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @hahn-kev)

@hahn-kev
Copy link
Contributor Author

@jasonleenaylor I've added a link to a Wikipedia page documenting valid xml chars. One thing it mentioned is that U+10000–U+10FFFF is also valid, we have not included that here. Is that ok? or do we need to include it? I realized this has the potential to strip valid data on save if we get this wrong. So it might be worth a bit more checking to ensure this is correct otherwise users could start losing data on save without noticing right away.

@jasonleenaylor
Copy link
Contributor

Sorry that I didn't see your comment before, yes we should include that range. It is very likely someone using FieldWorks will have a character in that range.

@hahn-kev
Copy link
Contributor Author

hahn-kev commented Aug 7, 2025

@jasonleenaylor I worked with Martin H and rewrote the regex to just match the invalid characters that we want to remove, and he helped me come up with a test case which was outside the normal range.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants