Skip to content

grass.script: Use text=True by default for subprocesses #5881

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

wenzeslaus
Copy link
Member

Unlike in the days of Python 2, Popen has now text mode which makes everything just standard (unicode) strings and newlines just LFs. This PR s trying to switch the default from text=False to text=True deep in grass.script in the Popen class wrapper. The idea is that one can still fallback to bytes and encoding/decoding if needed which is approach taken in grass.script.task for XML. All other places should remove the encoding/decoding code. I adjusted crucial places in grass.script, some tested locally on Linux, some not. Changes in grass.script are hopeful in the sense that there should be no impact to the user code. However, some changes are needed in Python tools, suggesting that there is a risk that user code would have to change. This would prevent us from doing this before next major release (v9), but we can likely adjust the grass.script code to allow for both encoded (bytes) and unicode strings (str) as inputs. Our encode and decode utils functions already work as pass-thru when the input type is the desired output type.

I putting this out for discussion and to see how many tests will fail, and I'm especially intersted to see how this turns out on Windows.

Unlike in the days of Python 2, Popen has now text mode which makes everything just standard (unicode) strings and newlines just LFs. This PR s trying to switch the default from text=False to text=True deep in grass.script in the Popen class wrapper. The idea is that one can still fallback to bytes and encoding/decoding if needed which is approach taken in grass.script.task for XML. All other places should remove the encoding/decoding code. I adjusted crucial places in grass.script, some tested locally on Linux, some not. Changes in grass.script are hopeful in the sense that there should be no impact to the user code. However, some changes are needed in Python tools, suggesting that there is a risk that user code would have to change. This would prevent us from doing this before next major release (v9), but we can likely adjust the grass.script code to allow for both encoded (bytes) and unicode strings (str) as inputs. Our encode and decode utils functions already work as pass-thru when the input type is the desired output type.

I putting this out for discussion and to see how many tests will fail, and I'm especially intersted to see how this turns out on Windows.
@echoix
Copy link
Member

echoix commented Jun 12, 2025

See also #4517

@wenzeslaus
Copy link
Member Author

I forgot about #4517, thanks. Yes, this is addressing it. I need to think over how to deal with text because universal_newlines is just terrible (although the fact that it is a terrible name 100% avoids any conflict). The Tools API (#2923) is trying to avoid the conflicts in different way - by passing these special things through constructor rather than together with the tool parameters, at least preferably - plus it has a subprocess.run-like API (accepting list of strings) which would be likely used or at least applicable in a special case when the parameter needs to be used. So it is less of an issue for the Tools API.

@wenzeslaus
Copy link
Member Author

Just to understand better, these warnings are generated when xfail test passes or always? I thought only when it passes, but the wording suggests otherwise.

gui\wxpython\core\testsuite\test_gcmd.py:20
  C:\a\grass\grass\gui\wxpython\core\testsuite\test_gcmd.py:20: UserWarning: Once the test is fixed and passing, remove the @xfail_windows decorator
    @xfail_windows

@echoix
Copy link
Member

echoix commented Jun 12, 2025

Always shown. If a test is unintentionally fixed, it will flag as an failure (unexpected success). So we can notice it.

@github-actions github-actions bot added raster Related to raster data processing Python Related code is in Python libraries module general tests Related to Test Suite labels Jun 12, 2025
@echoix
Copy link
Member

echoix commented Jun 12, 2025

A first error I looked for, in Jupyter utils, the write on line 75 encodes the string to write to proj_input.

def reproject_region(region, from_proj, to_proj):
"""Reproject boundary of region from one projection to another.
:param dict region: region to reproject as a dictionary with long key names
output of get_region
:param str from_proj: PROJ.4 string of region; output of get_location_proj_string
:param str in_proj: PROJ.4 string of target location;
output of get_location_proj_string
:return dict region: reprojected region as a dictionary with long key names
"""
region = region.copy()
# reproject all corners, otherwise reproj. region may be underestimated
# even better solution would be reprojecting vector region like in r.import
proj_input = (
f"{region['east']} {region['north']}\n"
f"{region['west']} {region['north']}\n"
f"{region['east']} {region['south']}\n"
f"{region['west']} {region['south']}\n"
)
proc = gs.start_command(
"m.proj",
input="-",
separator=" , ",
proj_in=from_proj,
proj_out=to_proj,
flags="d",
stdin=gs.PIPE,
stdout=gs.PIPE,
stderr=gs.PIPE,
)
proc.stdin.write(gs.encode(proj_input))

…write is text encoded in bytes, but the stream is set to text mode (str) automatically. This provides backwards compatibility, so that the old code which encodes explicitly, but does not disable text mode, still works without changes.
@wenzeslaus
Copy link
Member Author

Both on macOS and Windows, in gunittest tests, there is a lot of TypeError: 'in <string>' requires string as left operand, not bytes which is a result of testing output as bytes. Fixing this in tests is not a problem, the question is whether we think that users will have such code. If they always convert using gs.decode, then it is not an issue because it works as pass-through for str. However, if they operate with the bytes, then this PR will break their code. It is certainly possible, but the real question is how common it is and whether it is worth breaking those cases for benefit of everything else working as expected.

…we completly switch and remove the compatobility code. m.proj with text=True deadlocks when used from grass.jupyter in any way (explicit stdin or communicate, with or without the stdin wrapper). Using text=False makes m.proj work in this context which makes sense because it is doing a lot of low lever tricks.
…es already encodes/decodes for the user. grass.gunittest call_module allowed both, but was passing bytes to Popen.
@echoix
Copy link
Member

echoix commented Jun 19, 2025

I've already real an example pseudo code on the subject, in the Python docs. Let me find it back...

Ok: it was here, under displayhook

https://docs.python.org/3/library/sys.html#sys.displayhook

def displayhook(value):
    if value is None:
        return
    # Set '_' to None to avoid recursion
    builtins._ = None
    text = repr(value)
    try:
        sys.stdout.write(text)
    except UnicodeEncodeError:
        bytes = text.encode(sys.stdout.encoding, 'backslashreplace')
        if hasattr(sys.stdout, 'buffer'):
            sys.stdout.buffer.write(bytes)
        else:
            text = bytes.decode(sys.stdout.encoding, 'strict')
            sys.stdout.write(text)
    sys.stdout.write("\n")
    builtins._ = value

Is TypeError the only one or main one that is reported?

@wenzeslaus
Copy link
Member Author

This goes a step further checking encoding errors. The TypeError is for bytes vs str confusion, so that would more be just the buffer check part. Are you suggesting to base our implementation on the displayhook implementation?

@echoix
Copy link
Member

echoix commented Jun 20, 2025

Are you suggesting to base our implementation on the displayhook implementation?

Not really, it was using buffer, which isn't always available as they warn in their docs if you redirect to another type of class

Copy link
Member

@echoix echoix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's nice seeing so many new tests (possibly passing)

Comment on lines +58 to +62
"""
Decodes bytes into str if writing failed and text mode was automatically set.

Remove for version 9
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we already note the deprecation and planned removal (like they do for Python docs)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an internal not to remove the code. I don't really know what to say and where. The documentation was written for Python 2 and ASCII strings, so that's actually now fixed. When using universal_newlines=False, using the encoding parameter still makes sense, so no strong reason to drop that piece. What will get dropped is the fallback when you use bytes, but don't set universal_newlines=False. I guess I could review all these functions and actually fully document how they are handling stdnin, stdout, and stderr.

Comment on lines +106 to +107
if "text" not in kwargs and "universal_newlines" not in kwargs:
kwargs["text"] = True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a tool (module) or extension somewhere that has an option named text that would conflict with this? (Assuming Popen and the tool kwargs are still mixed together)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do, but this does not conflict because it is not mixed together anymore. From here, it goes directly to Popen parameters, the tools and its parameters are already a list of strings passed separately. That's different than run_command and family. The only issue is if you want to pass text=False to run_command - we don't support that (text is not in _popen_args in core.py which is used to make the distinction).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we leave like this or have to do something

@github-actions github-actions bot added vector Related to vector data processing temporal Related to temporal data processing labels Jun 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
general libraries misc module notebook Python Related code is in Python raster Related to raster data processing temporal Related to temporal data processing tests Related to Test Suite vector Related to vector data processing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants