grass.script: Use text=True by default for subprocesses #5881

wenzeslaus · 2025-06-12T13:48:31Z

Unlike in the days of Python 2, Popen has now text mode which makes everything just standard (unicode) strings and newlines just LFs. This PR s trying to switch the default from text=False to text=True deep in grass.script in the Popen class wrapper. The idea is that one can still fallback to bytes and encoding/decoding if needed which is approach taken in grass.script.task for XML. All other places should remove the encoding/decoding code. I adjusted crucial places in grass.script, some tested locally on Linux, some not. Changes in grass.script are hopeful in the sense that there should be no impact to the user code. However, some changes are needed in Python tools, suggesting that there is a risk that user code would have to change. This would prevent us from doing this before next major release (v9), but we can likely adjust the grass.script code to allow for both encoded (bytes) and unicode strings (str) as inputs. Our encode and decode utils functions already work as pass-thru when the input type is the desired output type.

I putting this out for discussion and to see how many tests will fail, and I'm especially intersted to see how this turns out on Windows.

Unlike in the days of Python 2, Popen has now text mode which makes everything just standard (unicode) strings and newlines just LFs. This PR s trying to switch the default from text=False to text=True deep in grass.script in the Popen class wrapper. The idea is that one can still fallback to bytes and encoding/decoding if needed which is approach taken in grass.script.task for XML. All other places should remove the encoding/decoding code. I adjusted crucial places in grass.script, some tested locally on Linux, some not. Changes in grass.script are hopeful in the sense that there should be no impact to the user code. However, some changes are needed in Python tools, suggesting that there is a risk that user code would have to change. This would prevent us from doing this before next major release (v9), but we can likely adjust the grass.script code to allow for both encoded (bytes) and unicode strings (str) as inputs. Our encode and decode utils functions already work as pass-thru when the input type is the desired output type. I putting this out for discussion and to see how many tests will fail, and I'm especially intersted to see how this turns out on Windows.

echoix · 2025-06-12T13:59:20Z

See also #4517

wenzeslaus · 2025-06-12T14:41:32Z

I forgot about #4517, thanks. Yes, this is addressing it. I need to think over how to deal with text because universal_newlines is just terrible (although the fact that it is a terrible name 100% avoids any conflict). The Tools API (#2923) is trying to avoid the conflicts in different way - by passing these special things through constructor rather than together with the tool parameters, at least preferably - plus it has a subprocess.run-like API (accepting list of strings) which would be likely used or at least applicable in a special case when the parameter needs to be used. So it is less of an issue for the Tools API.

wenzeslaus · 2025-06-12T14:42:48Z

Just to understand better, these warnings are generated when xfail test passes or always? I thought only when it passes, but the wording suggests otherwise.

gui\wxpython\core\testsuite\test_gcmd.py:20
  C:\a\grass\grass\gui\wxpython\core\testsuite\test_gcmd.py:20: UserWarning: Once the test is fixed and passing, remove the @xfail_windows decorator
    @xfail_windows

echoix · 2025-06-12T14:46:32Z

Always shown. If a test is unintentionally fixed, it will flag as an failure (unexpected success). So we can notice it.

echoix · 2025-06-12T17:06:04Z

A first error I looked for, in Jupyter utils, the write on line 75 encodes the string to write to proj_input.

grass/python/grass/jupyter/utils.py

Lines 44 to 75 in cee2818

    
           def reproject_region(region, from_proj, to_proj): 
        
               """Reproject boundary of region from one projection to another. 
        
               :param dict region: region to reproject as a dictionary with long key names 
        
                               output of get_region 
        
               :param str from_proj: PROJ.4 string of region; output of get_location_proj_string 
        
               :param str in_proj: PROJ.4 string of target location; 
        
                               output of get_location_proj_string 
        
               :return dict region: reprojected region as a dictionary with long key names 
        
               """ 
        
               region = region.copy() 
        
               # reproject all corners, otherwise reproj. region may be underestimated 
        
               # even better solution would be reprojecting vector region like in r.import 
        
               proj_input = ( 
        
                   f"{region['east']} {region['north']}\n" 
        
                   f"{region['west']} {region['north']}\n" 
        
                   f"{region['east']} {region['south']}\n" 
        
                   f"{region['west']} {region['south']}\n" 
        
               ) 
        
               proc = gs.start_command( 
        
                   "m.proj", 
        
                   input="-", 
        
                   separator=" , ", 
        
                   proj_in=from_proj, 
        
                   proj_out=to_proj, 
        
                   flags="d", 
        
                   stdin=gs.PIPE, 
        
                   stdout=gs.PIPE, 
        
                   stderr=gs.PIPE, 
        
               ) 
        
               proc.stdin.write(gs.encode(proj_input))

…write is text encoded in bytes, but the stream is set to text mode (str) automatically. This provides backwards compatibility, so that the old code which encodes explicitly, but does not disable text mode, still works without changes.

wenzeslaus · 2025-06-18T21:50:28Z

Both on macOS and Windows, in gunittest tests, there is a lot of TypeError: 'in <string>' requires string as left operand, not bytes which is a result of testing output as bytes. Fixing this in tests is not a problem, the question is whether we think that users will have such code. If they always convert using gs.decode, then it is not an issue because it works as pass-through for str. However, if they operate with the bytes, then this PR will break their code. It is certainly possible, but the real question is how common it is and whether it is worth breaking those cases for benefit of everything else working as expected.

…we completly switch and remove the compatobility code. m.proj with text=True deadlocks when used from grass.jupyter in any way (explicit stdin or communicate, with or without the stdin wrapper). Using text=False makes m.proj work in this context which makes sense because it is doing a lot of low lever tricks.

…es already encodes/decodes for the user. grass.gunittest call_module allowed both, but was passing bytes to Popen.

echoix · 2025-06-19T22:12:23Z

I've already real an example pseudo code on the subject, in the Python docs. Let me find it back...

Ok: it was here, under displayhook

https://docs.python.org/3/library/sys.html#sys.displayhook

def displayhook(value):
    if value is None:
        return
    # Set '_' to None to avoid recursion
    builtins._ = None
    text = repr(value)
    try:
        sys.stdout.write(text)
    except UnicodeEncodeError:
        bytes = text.encode(sys.stdout.encoding, 'backslashreplace')
        if hasattr(sys.stdout, 'buffer'):
            sys.stdout.buffer.write(bytes)
        else:
            text = bytes.decode(sys.stdout.encoding, 'strict')
            sys.stdout.write(text)
    sys.stdout.write("\n")
    builtins._ = value

Is TypeError the only one or main one that is reported?

wenzeslaus · 2025-06-19T22:52:34Z

This goes a step further checking encoding errors. The TypeError is for bytes vs str confusion, so that would more be just the buffer check part. Are you suggesting to base our implementation on the displayhook implementation?

echoix · 2025-06-20T16:13:29Z

Are you suggesting to base our implementation on the displayhook implementation?

Not really, it was using buffer, which isn't always available as they warn in their docs if you redirect to another type of class

echoix

It's nice seeing so many new tests (possibly passing)

echoix · 2025-06-20T16:25:12Z

python/grass/script/core.py

+        """
+        Decodes bytes into str if writing failed and text mode was automatically set.
+
+        Remove for version 9
+        """


Can we already note the deprecation and planned removal (like they do for Python docs)

This is an internal not to remove the code. I don't really know what to say and where. The documentation was written for Python 2 and ASCII strings, so that's actually now fixed. When using universal_newlines=False, using the encoding parameter still makes sense, so no strong reason to drop that piece. What will get dropped is the fallback when you use bytes, but don't set universal_newlines=False. I guess I could review all these functions and actually fully document how they are handling stdnin, stdout, and stderr.

echoix · 2025-06-20T16:27:35Z

python/grass/script/core.py

+        if "text" not in kwargs and "universal_newlines" not in kwargs:
+            kwargs["text"] = True


Do we have a tool (module) or extension somewhere that has an option named text that would conflict with this? (Assuming Popen and the tool kwargs are still mixed together)

We do, but this does not conflict because it is not mixed together anymore. From here, it goes directly to Popen parameters, the tools and its parameters are already a list of strings passed separately. That's different than run_command and family. The only issue is if you want to pass text=False to run_command - we don't support that (text is not in _popen_args in core.py which is used to make the distinction).

So we leave like this or have to do something

github-actions bot added raster Related to raster data processing Python Related code is in Python libraries module general tests Related to Test Suite labels Jun 12, 2025

github-actions bot added notebook misc labels Jun 18, 2025

wenzeslaus added 2 commits June 19, 2025 15:34

Do not encode before passing to subprocess.Popen. grass.pygrass.modul…

80bfab2

…es already encodes/decodes for the user. grass.gunittest call_module allowed both, but was passing bytes to Popen.

Specific fixes in start_command tests which tested with bytes

7fd8218

wenzeslaus added 2 commits June 20, 2025 09:43

Remove bytes from the tests, but test specifically for bytes as input.

bbe3899

Remove xfail_windows from tests which are now running on Windows

2868c9b

echoix reviewed Jun 20, 2025

View reviewed changes

github-actions bot added vector Related to vector data processing temporal Related to temporal data processing labels Jun 20, 2025

Enable 5 more tests. There should be the last unexpected_successes.

7e219aa

wenzeslaus mentioned this pull request Jun 20, 2025

[Feat] Let python handle newlines in subprocess calls with universal_newlines #4517

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

grass.script: Use text=True by default for subprocesses #5881

grass.script: Use text=True by default for subprocesses #5881

wenzeslaus commented Jun 12, 2025

Uh oh!

echoix commented Jun 12, 2025

Uh oh!

wenzeslaus commented Jun 12, 2025

Uh oh!

wenzeslaus commented Jun 12, 2025

Uh oh!

echoix commented Jun 12, 2025

Uh oh!

echoix commented Jun 12, 2025

Uh oh!

wenzeslaus commented Jun 18, 2025

Uh oh!

echoix commented Jun 19, 2025 •

edited

Loading

Uh oh!

wenzeslaus commented Jun 19, 2025

Uh oh!

echoix commented Jun 20, 2025

Uh oh!

echoix left a comment

Uh oh!

echoix Jun 20, 2025

Uh oh!

wenzeslaus Jun 20, 2025

Uh oh!

echoix Jun 20, 2025

Uh oh!

wenzeslaus Jun 20, 2025

Uh oh!

echoix Jun 22, 2025

Uh oh!

Uh oh!

		if "text" not in kwargs and "universal_newlines" not in kwargs:
		kwargs["text"] = True

Uh oh!

grass.script: Use text=True by default for subprocesses #5881

Are you sure you want to change the base?

grass.script: Use text=True by default for subprocesses #5881

Conversation

wenzeslaus commented Jun 12, 2025

Uh oh!

echoix commented Jun 12, 2025

Uh oh!

wenzeslaus commented Jun 12, 2025

Uh oh!

wenzeslaus commented Jun 12, 2025

Uh oh!

echoix commented Jun 12, 2025

Uh oh!

echoix commented Jun 12, 2025

Uh oh!

wenzeslaus commented Jun 18, 2025

Uh oh!

echoix commented Jun 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wenzeslaus commented Jun 19, 2025

Uh oh!

echoix commented Jun 20, 2025

Uh oh!

echoix left a comment

Choose a reason for hiding this comment

Uh oh!

echoix Jun 20, 2025

Choose a reason for hiding this comment

Uh oh!

wenzeslaus Jun 20, 2025

Choose a reason for hiding this comment

Uh oh!

echoix Jun 20, 2025

Choose a reason for hiding this comment

Uh oh!

wenzeslaus Jun 20, 2025

Choose a reason for hiding this comment

Uh oh!

echoix Jun 22, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

echoix commented Jun 19, 2025 •

edited

Loading