Open
Conversation
sundarshankar89
commented
Oct 3, 2025
| a text-based IO wrapper that will decode the underlying binary-mode file as text. | ||
| """ | ||
| use_encoding: str | None | ||
| _chardet_confidence_threshold: float = 0.6 |
Collaborator
Author
There was a problem hiding this comment.
Should this be determined by client or controlled common library?
|
✅ 40/40 passed, 2 skipped, 1m34s total Running from acceptance #363 |
asnare
reviewed
Oct 3, 2025
Contributor
asnare
left a comment
There was a problem hiding this comment.
Let's talk about this offline. I understand the motivation, but do have some concerns and maybe there's a different way of achieving the same result while assuaging them.
| from typing import BinaryIO, Literal, NoReturn, TextIO, TypeVar | ||
| from urllib.parse import quote_from_bytes as urlquote_from_bytes | ||
|
|
||
| import chardet |
Contributor
There was a problem hiding this comment.
This import means it's not an optional dependency, which is why the downstream projects are failing.
Comment on lines
+14
to
+18
| This Software contains code from the following open source projects, licensed under the GNU Lesser GPL v2: | ||
|
|
||
| chardet - https://github.com/chardet/chardet | ||
| Copyright 2005-2024 Mark Pilgrim, Maintainer: Dan Blanchard | ||
| License - https://github.com/chardet/chardet/blob/main/LICENSE No newline at end of file |
Contributor
There was a problem hiding this comment.
@gueniai: If we proceed with this, this will need review.
|
|
||
| chardet - https://github.com/chardet/chardet | ||
| Copyright 2005-2024 Mark Pilgrim, Maintainer: Dan Blanchard | ||
| License - https://github.com/chardet/chardet/blob/main/LICENSE No newline at end of file |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
chardetlibrary to better handle encoding detection when reading files. The change aims to improve the confidence and accuracy of text decoding, falling back to the system’s preferred encoding if confidence in the detected encoding is low.Question: Should we simplify the detection using chardet instead of current approach for non xml files?