Skip to content

grass.experimental: Add API and CLI to access tools without a session #5843

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 65 commits into
base: main
Choose a base branch
from

Conversation

wenzeslaus
Copy link
Member

Building on top of #2923 (not merged), this adds functionality which allows accessing "packed" native GRASS rasters to be used as tool parameters in command line:

grass run r.slope.aspect elevation=~/data/elevation.pack slope=~/data/slope.pack
grass run r.univar map=~/data/slope.pack

The above syntax is not actually implemented, but the code below works:

PYTHONPATH=$(grass --config python-path)
python -m grass.app run r.slope.aspect elevation=.../elevation.pack slope=.../slope.pack

The same functionality is also available from Python where it copies the syntax of plain Tools from #2923:

from grass.tools import StandaloneTools

tools = StandaloneTools()
tools.r_slope_aspect(elevation="elevation.pack", slope="slope.pack", aspect="aspect.pack")
print(f"Mean slope: {tools.r_univar(map='slope.pack')['mean']}")

The above syntax does not fully work, but the following one does:

tools.run("r.slope.aspect", elevation="elevation.pack", slope="slope.pack")

This PR is not meant for merging as is, but currently represents a final combination of all different features proposed. See discussion #5830 for details.

wenzeslaus and others added 30 commits June 3, 2023 23:57
This adds a Tools class which allows to access GRASS tools (modules) to be accessed using methods. Once an instance is created, calling a tool is calling a function (method) similarly to grass.jupyter.Map. Unlike grass.script, this does not require generic function name and unlike grass.pygrass module shortcuts, this does not require special objects to mimic the module families.

Outputs are handled through a returned object which is result of automatic capture of outputs and can do conversions from known formats using properties.

Usage example is in the _test() function in the file.

The code is included under new grass.experimental package which allows merging the code even when further breaking changes are anticipated.
…ting clearly ends the loop, the elapsed time may be significantly higher for some timeouts given that the lock process execution takes a second to execute.
@wenzeslaus wenzeslaus added this to the Future milestone Jun 5, 2025
@wenzeslaus wenzeslaus added the enhancement New feature or request label Jun 5, 2025
@github-actions github-actions bot added Python Related code is in Python libraries tests Related to Test Suite labels Jun 5, 2025
@pesekon2
Copy link
Contributor

pesekon2 commented Jun 5, 2025

I know that this is just a draft so far but quick tests with prefix parameter look fine. Thanks for all the work! The whisperer in Python console does not whisper what I would expect (see below) but otherwise smooth whatever I tried.

>>> from grass.experimental import tools
>>> v = tools.Tools(prefix='v')
>>> v.random(output='test', npoints=5)
<grass.experimental.tools.ExecutedTool object at 0x7f5e34d8b8c0>
>>> # let's check if it is there
>>> g = tools.Tools(prefix='g')
>>> g.list(type='vector').text
'test'
>>> # whispering test
>>> v.
v.env                         v.feed_input_to(              v.ignore_errors_of()          v.levenshtein_distance(       v.no_nonsense_run_from_list(  v.parse_command(              v.run(                        v.run_command(                v.run_from_list(              v.suggest_tools(

@wenzeslaus
Copy link
Member Author

The whisper is not implemented yet, but the bulk of the underlying coding for that is already done for the errors.

@wenzeslaus
Copy link
Member Author

File handling behavior

My question is if we value more complete consistency between CLI and Python API or we prefer having the best behavior possible in the given context.

The API naturally supports the following workflow where some imported data is reused between calls and some data is never exported:

tools = StandaloneTools()
tools.r_slope_aspect(elevation="elevation.grr", slope="slope.grr", aspect="aspect")
tools.r_flow(elevation="elevation", aspect="aspect", flowaccumulation="accumulation.grr", flags="3")

The elevation.grr raster is imported for the r.slope.aspect call and then it sits in the project, so the r.flow call can just use it. Similarly, aspect is created only within the project, so it is available for r.flow, but not exported. Here is the overview:

Data I/O Handling
elevation.grr input imported, reused
slope output created, exported
aspect temporary created, used, not exported
accumulation output created, exported

This is great, but slightly inconsistent with the command line behavior. There is no relation between the command line calls, so each call is separate and always has a new fresh project:

grass run r.slope.aspect elevation="elevation.grr" slope="slope.grr" aspect="aspect.grr"
grass run r.flow elevation="elevation.grr" aspect="aspect.grr" flowaccumulation="accumulation.grr" -3

The elevation.grr raster is now imported once of each call and aspect needs to be exported and imported to be used in the next call.

We could make Python API consistent by reducing the "state" aspect of StandaloneTools, and having each function call use a separate session with a fresh project. Then you would always write this:

tools = StandaloneTools()
tools.r_slope_aspect(elevation="elevation.grr", slope="slope.grr", aspect="aspect.grr")
tools.r_flow(elevation="elevation.grr", aspect="aspect.grr", flowaccumulation="accumulation.grr", flags="3")

In Python, we can make it configurable with parameters of the class (or different classes), for example:

tools1 = StandaloneTools(use_one_project=False)  # behaves exactly like CLI
tools2 = StandaloneTools(use_one_project=True)  # allows for rasters to be reused

This approach can allow control over other behaviors, for example, reduce_reimports=True may allow for elevation to be imported only once and aspect not at all in the following example:

tools = StandaloneTools()
tools.r_slope_aspect(elevation="elevation.grr", slope="slope.grr", aspect="aspect.grr")
# When we see elevation.grr and aspect.grr as inputs in the following function call,
# we will just use the ones we already have.
tools.r_flow(elevation="elevation.grr", aspect="aspect.grr", flowaccumulation="accumulation.grr", flags="3")

Even with the behavior being potentially configurable, I prefer going here with the best possible behavior for the context as opposed to complete consistency between the CLI and Python API. So, my choice at this point is to have one session (and one project) for all function calls with one StandaloneTools object.

Additionally, the different behavior in Python API does not mean that a user cannot achieve the same with CLI. For the CLI not to have feature parity, despite the different defaults, we could implement something like:

grass -c "elevation.grr" "project1"
grass --project "project1" run r.slope.aspect elevation="elevation.grr" slope="slope.grr" aspect="aspect"
grass --project "project1" run r.flow elevation="elevation" aspect="aspect" flowaccumulation="accumulation.grr" -3
rm -r "project1"

Do you agree or disagree with having the different default behavior in CLI and in Python API?

@pesekon2
Copy link
Contributor

Even with the behavior being potentially configurable, I prefer going here with the best possible behavior for the context as opposed to complete consistency between the CLI and Python API

I fully agree with this approach. Exports and imports with each function call give it an undesired overhead.

When it comes to the options of configuration - the only option I would see useful is having something like tools1 = StandaloneTools(project_id=project1) with some default tmp_project. Then you could have in the same script two different projects and potentially compare their results (e.g., how do the results differ if I run it in two different projections?) or do some other tricks.

@echoix
Copy link
Member

echoix commented Jun 10, 2025

So basically it is creating a fluent interface in Python? Like what is often seen in JavaScript, but other languages too (like C#)

@wenzeslaus
Copy link
Member Author

Computational region behavior

There is more than way one for the computational region to behave when calling multiple tools with the same StandaloneTools object in Python:

  1. First input in the first function call determines the computational region for the call and all subsequent calls.
  2. First input of each function call determines the computational region for the given function call (only).
  3. Computational region is never set automatically and user always needs to explicitly set it manually.

1. First input of the first call

First input of the first call of a tool (function) determines the computational region. Subsequent calls use that region.

tools = StandaloneTools()
# The following will take the first input and use it for the computational region.
tools.r_slope_aspect(
    elevation="raster_file_3x3.grr",
    aspect="file.grr",
    flags="a",
)
# Now computational region is set to whatever raster_file_3x3.grr is.
# The following applies the standard GRASS resampling and extent rules.
tools.r_slope_aspect(
    elevation="raster_file_4x5.grr",
    aspect="file2.grr",
    flags="a",
)
# file2.grr has now size 3x3 and if the extents are not overlapping, it contains only nulls.

This is the behavior currently implemented. The nice thing is that it allows for not using g.region at all (above) or using it at any point:

tools = StandaloneTools()
tools.g_region(raster="raster_file_4x5.grr")
tools.r_slope_aspect(
    elevation="raster_file_3x3.grr",
    aspect="file.grr",
    flags="a",
)
# The output is now 4x5.
tools.g_region(raster="raster_file_3x3.grr")
tools.r_slope_aspect(
    elevation="raster_file_4x5.grr",
    aspect="file2.grr",
    flags="a",
)
# The output is now 3x3.

The raster parameter of g.region simply works as expected. I added also checking for any computational region modification based on the underlying WIND file modification time. This way, the current code also supports any g.region parameter or, theoretically, any other tool which would change the region:

tools = StandaloneTools()
tools.g_region(n=..., ..., res=...)
tools.r_slope_aspect(
    elevation="raster_file_3x3.grr",
    aspect="file.grr",
    flags="a",
)

I like this option because g.region works at any place as expected, but you can also completely leave it out. So, you can focus on the tools and your data, with some API-specific risks related to not handling extent and resolution explicitly, but if you know about computational region, and want to tap into its power, you can. I'm little less comfortable with inheriting the region from the first call in all the subsequent calls, but I expect this not to be an issue for most of workflows.

2. First input of each call

First input of the each call of a tool (function) determines the computational region for the given call. Subsequent calls are not influenced by the previous calls.

tools = StandaloneTools()
# The following will take the first input and use it for the computational region.
tools.r_slope_aspect(
    elevation="raster_file_3x3.grr",
    aspect="file.grr",
    flags="a",
)
# Now computational region is set to whatever raster_file_3x3.grr is.
# The following will take the first input and use it for the computational region.
tools.r_slope_aspect(
    elevation="raster_file_4x5.grr",
    aspect="file2.grr",
    flags="a",
)
# Now computational region is set to whatever raster_file_4x5.grr is.
# file2.grr has now size 4x5 and its the extent is overlapping with raster_file_4x5.grr.

This make the calls completely independent in terms of region. This means that also any g.region calls are ignored. A variation of this could change the behavior based on the computational region changes: If a change would be detected (based on the file modified time like in option 1), the computational region would be respected, otherwise each call would get its own computational region.

What I like about this option is that it is clear how each call behaves regardless of their order, and independent calls (with different StandaloneTools objects) give the same result as a series on calls on the same object. It also aligns well with CLI (see below). However, it does not work with g.region, or it would switch behavior on the fly to accommodate it.

3. Manual-only explicitly set region

The computational region is not set automatically and it defaults to whatever the default is (1x1 at at 0,0 this point). User needs to explicitly call g.region. Subsequent calls use that region.

tools = StandaloneTools()
# Set the computation region explicitly.
tools.g_region(raster="raster_file_3x3.grr")
tools.r_slope_aspect(
    elevation="raster_file_3x3.grr",
    aspect="file.grr",
    flags="a",
)
# Now, set the computation region explicitly again if a different one is needed.
tools.g_region(raster="raster_file_4x5.grr")
tools.r_slope_aspect(
    elevation="raster_file_4x5.grr",
    aspect="file2.grr",
    flags="a",
)

This works just the way it works now, so any experienced GRASS user will be right at home, but any user still needs to know about the computational region, and each workflow will have at least two steps, computational region setup and the actual tool call.

Configuration

We need to decide what is the default behavior, but we can also provide configuration for all behaviors, for example:

tools = StandaloneTools(region_from_first_call=True)  # option 1
tools = StandaloneTools(region_for_each_call=True)  # option 2
tools = StandaloneTools(explicit_region_only=True)  # option 3

or:

tools = StandaloneTools(use_region=False, refresh_region=False)  # option 1
tools = StandaloneTools(use_region=False, refresh_region=True)  # option 2
tools = StandaloneTools(use_region=True, refresh_region=None)  # option 3

Notably, the use of use_region: bool is similar to grass.jupyter.Map where by default, the first added raster, or the first added vector (possibly combined with a subsequently added raster), determines the computational region used internally for display, but using use_region=True turns of the automatic region setting, and simply follows the current computational region.

CLI

Similarly to the issue with data files, the CLI needs to behave a certain way because the individual calls do not share one object like the Python API does. CLI follows the option 2, each call has its own computational region.

# The following will take the first input and use it for the computational region.
grass run r.slope.aspect elevation="raster_file_3x3.grr" aspect="file.grr" -a
# The following will again take the first input and use it for the computational region.
grass run r.slope.aspect elevation="raster_file_4x5.grr" aspect="file.grr" -a

With using existing project (similarly to the feature parity in data file handling), we could provide CLI with a project parameter and a set of parameters related to computational region:

grass -c "elevation.grr" "project1"
grass --project ~/data/nc --use-region run r.slope.aspect elevation="raster_file_3x3.grr" aspect="file.grr" -a
grass --project ~/data/nc --use-region run r.slope.aspect elevation="raster_file_4x5.grr" aspect="file.grr" -a
rm -r "project1"

Bonus: Tracking state of computational region

While I used last modified time to track user edits to the computational region, we could support this tracking in the computational region itself. The current trouble is that computational region is tracked in a file called WIND, and this file is created with each mapset, in fact presence of the WIND file is the check used to recognize valid mapsets. While this is nice for tools because they can simply rely on computational region being set (this is happening in the library code, not the tool code itself), the computational region needs to be set before any tool runs, so this is possibly without any input data to provide a reasonable computational region. Later, there is no way of telling whether the values in the computational region are from a user or they are simply the default. The default is 1x1 at 0,0, but should we simply assume legitimate user case for that extent and resolution, and behave differently based on that. We don't do that now.

To help the system know what the status is, we could save the status, or here more the provenance, of the computational region to the computational region itself. So the WIND file would have a new key source with values default and user. While the creation of the default WIND file would store default, g.region would store user. Possibly, system or auto state could show that an automated system set the computational region through some means, but user did not touch it yet.

We could take a different approach, and have states based on from what the region was determined, namely determined from vector and raster (and then default and user for all the other states). If the system (that can be the StandaloneTools or the GUI) sees computational region which is default, but has a raster as tool parameter, it would call g.region with raster which would then store raster as the state. Subsequent calls would see raster, and would not touch the region. If first tool call has only a vector as a parameter, the system can call g.region with vector which would then store vector as the state. A next call with raster as a parameter would supply the resolution and alignment, and store raster. Again, subsequent calls would see raster, and would not touch the region.

The StandaloneTools don't need this. Even generally, this can be done with checking the time stamp or the content. However, this would be a way how to implement same behavior in different places possibly without using the same API.

@wenzeslaus
Copy link
Member Author

When it comes to the options of configuration - the only option I would see useful is having something like tools1 = StandaloneTools(project_id=project1) with some default tmp_project.

If I understand you correctly, the feature makes sense to me, and is already there. StandaloneTools can use an existing session:

# No project and no session
gs.create_project("project1")
with gs.setup.init("project1") as session:
    tools = StandaloneTools(session=session)
# With an existing session, but in a separate mapset.
with grass.experimental.TemporaryMapsetSession() as session:
    tools = StandaloneTools(session=session)

@wenzeslaus
Copy link
Member Author

So basically it is creating a fluent interface in Python?

It seems to me that method chaining is heavily present in an fluent interface. I don't use method chaining here because the point is to return data when appropriate which the method changing prevents. Also, the methods here don't modify the object that much which is what the methods in fluent interface do (they modify the project, so more a side effect). Here, the tools object is an interface for functionality. In OOP, this could be a facade, providing a front-face to complex, underlying code, consisting of multiple components. Especially the NumPy piece is trying to achieve more the functional programming ideas than OOP, with the object being a necessary vehicle for providing good interface (tools as function names), cutting some overhead (at minimum the session setup), and possibly allowing for configuration.

@pesekon2
Copy link
Contributor

When it comes to the options of configuration - the only option I would see useful is having something like tools1 = StandaloneTools(project_id=project1) with some default tmp_project.

If I understand you correctly, the feature makes sense to me, and is already there. StandaloneTools can use an existing session:

# No project and no session
gs.create_project("project1")
with gs.setup.init("project1") as session:
    tools = StandaloneTools(session=session)
# With an existing session, but in a separate mapset.
with grass.experimental.TemporaryMapsetSession() as session:
    tools = StandaloneTools(session=session)

Thanks, it is the latter one. The first one looks terrible to me as it uses both gscript and tools.

@wenzeslaus
Copy link
Member Author

Use of NumPy array IO with the standalone tools API

The combination of NumPy array IO (from #5878) with the standalone tools API (from #5843 - this PR) allows to use tools with NumPy arrays without a project:

from grass.experimental.standalone_tools import StandaloneTools

tools = StandaloneTools()
slope = tools.r_slope_aspect(elevation=np.ones((2, 3)), slope=np.ndarray)

Complications with computational region

With how the StandaloneTools are implemented now, the following will fail because the initially set region will be incompatible with the array size in the second call (see option 1 on the region comment above):

from grass.experimental.standalone_tools import StandaloneTools

tools = StandaloneTools()
slope1 = tools.r_slope_aspect(elevation=np.ones((2, 3)), slope=np.ndarray)
slope2 = tools.r_slope_aspect(elevation=np.ones((5, 5)), slope=np.ndarray)

One way how to avoid it is providing some parameter to StandaloneTools, like StandaloneTools(refresh_region=True). Another way is to use multiple instances:

from grass.experimental.standalone_tools import StandaloneTools

slope1 = StandaloneTools().r_slope_aspect(elevation=np.ones((2, 3)), slope=np.ndarray)
slope2 = StandaloneTools().r_slope_aspect(elevation=np.ones((5, 5)), slope=np.ndarray)

Having the multiple calls and having that instance immediately forgotten does not look that great.

Evaluating length of user code

One could also argue that, in case of NumPy arrays, really plain functions are preferable over calling a tool as a method of an object because even a single call still requires creation of an object beforehand or in the same statement as in these two examples:

from grass.experimental.standalone_tools import StandaloneTools

tools = StandaloneTools()
slope = tools.r_slope_aspect(elevation=np.ones((2, 3)), slope=np.ndarray)
from grass.experimental.standalone_tools import StandaloneTools

slope = StandaloneTools().r_slope_aspect(elevation=np.ones((5, 5)), slope=np.ndarray)

Shortcut object in the library

We could create a StandaloneTools object on the Python module level, so that users can import it. This would be similar to grass.pygrass.modules.shortcuts (hence calling it shortcut here). In the library, we would have:

# grass/experimental/standalone_tools.py

tools = StandaloneTools(refresh_region=True, keep_data=False, use_one_project=False)

And then the user code would be:

# myscript.py

from grass.experimental.standalone_tools import tools

slope = tools.r_slope_aspect(elevation=np.ones((5, 5)), slope=np.ndarray)

This would exist alongside the option to create one or more StandaloneTools objects, and it would likely have different configuration (independent region, no data preserved, for truly standalone runs). Result would be possible confusion due to another option and some inconsistency, but it might be the best way how to provide such API because it creates simplest user code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request libraries Python Related code is in Python tests Related to Test Suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants