Skip to content

Conversation

@piterpunk
Copy link

What does this PR do?

  • Deduplicates binary or text detection in salt.utils.files.is_text and salt.utils.stringutils.is_binary
  • Updates salt.utils.files.is_text and salt.utils.files.is_binary to use the salt.utils.stringutils.is_binary to do the text or binary identification.

What issues does this PR fix or reference?

Fixes #66706
Fixes #62214

Previous Behavior

The salt.utils.files.is_text and salt.utils.files.is_binary both gives false results when an utf-8 multibyte character is truncated. Different heuristics between the two modules also gives inconsistent results between the two methods.

New Behavior

salt.utils.files.is_text and salt.utils.files.is_binary now uses the salt.utils.stringutils.is_binary to find if the given file is text or binary. Also, the detection was changed to accept truncated utf-8 multibyte characters.

Merge requirements satisfied?

[NOTICE] Bug fixes or features added to Salt require tests.

Commits signed with GPG?

No

- Deduplicates binary or text detection in utils.files.is_text and
  utils.stringutils.is_binary
- Updates utils.files.is_text and utils.files.is_binary to use the
  utils.stringutils.is_binary to do the text or binary identification.
twangboy
twangboy previously approved these changes Nov 13, 2025
@twangboy twangboy added the test:full Run the full test suite label Nov 13, 2025
@twangboy twangboy added this to the Chlorine v3007.9 milestone Nov 13, 2025
- Define `text_characters` only when it will be used
- Adds commentaries about the `text_characters` definition
- Fix a corner case if a utf-8 multibyte character is truncated and
  more than 30% of the byte array have values between 1 and 31.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

test:full Run the full test suite

Projects

None yet

Development

Successfully merging this pull request may close these issues.

salt/utils/files.py is_text — false "is not text" results with UTF-8 [BUG] file.blockreplace does not work with UTF-8

3 participants