-
-
Notifications
You must be signed in to change notification settings - Fork 360
grass.script: Use text=True by default for subprocesses #5881
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Unlike in the days of Python 2, Popen has now text mode which makes everything just standard (unicode) strings and newlines just LFs. This PR s trying to switch the default from text=False to text=True deep in grass.script in the Popen class wrapper. The idea is that one can still fallback to bytes and encoding/decoding if needed which is approach taken in grass.script.task for XML. All other places should remove the encoding/decoding code. I adjusted crucial places in grass.script, some tested locally on Linux, some not. Changes in grass.script are hopeful in the sense that there should be no impact to the user code. However, some changes are needed in Python tools, suggesting that there is a risk that user code would have to change. This would prevent us from doing this before next major release (v9), but we can likely adjust the grass.script code to allow for both encoded (bytes) and unicode strings (str) as inputs. Our encode and decode utils functions already work as pass-thru when the input type is the desired output type. I putting this out for discussion and to see how many tests will fail, and I'm especially intersted to see how this turns out on Windows.
See also #4517 |
I forgot about #4517, thanks. Yes, this is addressing it. I need to think over how to deal with text because universal_newlines is just terrible (although the fact that it is a terrible name 100% avoids any conflict). The Tools API (#2923) is trying to avoid the conflicts in different way - by passing these special things through constructor rather than together with the tool parameters, at least preferably - plus it has a subprocess.run-like API (accepting list of strings) which would be likely used or at least applicable in a special case when the parameter needs to be used. So it is less of an issue for the Tools API. |
Just to understand better, these warnings are generated when xfail test passes or always? I thought only when it passes, but the wording suggests otherwise.
|
Always shown. If a test is unintentionally fixed, it will flag as an failure (unexpected success). So we can notice it. |
A first error I looked for, in Jupyter utils, the write on line 75 encodes the string to write to proj_input. grass/python/grass/jupyter/utils.py Lines 44 to 75 in cee2818
|
…write is text encoded in bytes, but the stream is set to text mode (str) automatically. This provides backwards compatibility, so that the old code which encodes explicitly, but does not disable text mode, still works without changes.
Both on macOS and Windows, in gunittest tests, there is a lot of |
…we completly switch and remove the compatobility code. m.proj with text=True deadlocks when used from grass.jupyter in any way (explicit stdin or communicate, with or without the stdin wrapper). Using text=False makes m.proj work in this context which makes sense because it is doing a lot of low lever tricks.
…es already encodes/decodes for the user. grass.gunittest call_module allowed both, but was passing bytes to Popen.
I've already real an example pseudo code on the subject, in the Python docs. Let me find it back... Ok: it was here, under displayhook https://docs.python.org/3/library/sys.html#sys.displayhook def displayhook(value):
if value is None:
return
# Set '_' to None to avoid recursion
builtins._ = None
text = repr(value)
try:
sys.stdout.write(text)
except UnicodeEncodeError:
bytes = text.encode(sys.stdout.encoding, 'backslashreplace')
if hasattr(sys.stdout, 'buffer'):
sys.stdout.buffer.write(bytes)
else:
text = bytes.decode(sys.stdout.encoding, 'strict')
sys.stdout.write(text)
sys.stdout.write("\n")
builtins._ = value Is TypeError the only one or main one that is reported? |
This goes a step further checking encoding errors. The TypeError is for bytes vs str confusion, so that would more be just the buffer check part. Are you suggesting to base our implementation on the displayhook implementation? |
Not really, it was using buffer, which isn't always available as they warn in their docs if you redirect to another type of class |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's nice seeing so many new tests (possibly passing)
""" | ||
Decodes bytes into str if writing failed and text mode was automatically set. | ||
|
||
Remove for version 9 | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we already note the deprecation and planned removal (like they do for Python docs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an internal not to remove the code. I don't really know what to say and where. The documentation was written for Python 2 and ASCII strings, so that's actually now fixed. When using universal_newlines=False, using the encoding parameter still makes sense, so no strong reason to drop that piece. What will get dropped is the fallback when you use bytes, but don't set universal_newlines=False. I guess I could review all these functions and actually fully document how they are handling stdnin, stdout, and stderr.
if "text" not in kwargs and "universal_newlines" not in kwargs: | ||
kwargs["text"] = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a tool (module) or extension somewhere that has an option named text
that would conflict with this? (Assuming Popen and the tool kwargs are still mixed together)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do, but this does not conflict because it is not mixed together anymore. From here, it goes directly to Popen parameters, the tools and its parameters are already a list of strings passed separately. That's different than run_command and family. The only issue is if you want to pass text=False to run_command - we don't support that (text is not in _popen_args in core.py which is used to make the distinction).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we leave like this or have to do something
Unlike in the days of Python 2, Popen has now text mode which makes everything just standard (unicode) strings and newlines just LFs. This PR s trying to switch the default from text=False to text=True deep in grass.script in the Popen class wrapper. The idea is that one can still fallback to bytes and encoding/decoding if needed which is approach taken in grass.script.task for XML. All other places should remove the encoding/decoding code. I adjusted crucial places in grass.script, some tested locally on Linux, some not. Changes in grass.script are hopeful in the sense that there should be no impact to the user code. However, some changes are needed in Python tools, suggesting that there is a risk that user code would have to change. This would prevent us from doing this before next major release (v9), but we can likely adjust the grass.script code to allow for both encoded (bytes) and unicode strings (str) as inputs. Our encode and decode utils functions already work as pass-thru when the input type is the desired output type.
I putting this out for discussion and to see how many tests will fail, and I'm especially intersted to see how this turns out on Windows.