-
-
Notifications
You must be signed in to change notification settings - Fork 360
grass.experimental: Add API and CLI to access tools without a session #5843
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This adds a Tools class which allows to access GRASS tools (modules) to be accessed using methods. Once an instance is created, calling a tool is calling a function (method) similarly to grass.jupyter.Map. Unlike grass.script, this does not require generic function name and unlike grass.pygrass module shortcuts, this does not require special objects to mimic the module families. Outputs are handled through a returned object which is result of automatic capture of outputs and can do conversions from known formats using properties. Usage example is in the _test() function in the file. The code is included under new grass.experimental package which allows merging the code even when further breaking changes are anticipated.
…ute with that stdin
…are different now)
… the max tries, focusing on timeout
…ting clearly ends the loop, the elapsed time may be significantly higher for some timeouts given that the lock process execution takes a second to execute.
…in standard way, pass stdin as StringIO object
…ree types of methods)
I know that this is just a draft so far but quick tests with >>> from grass.experimental import tools
>>> v = tools.Tools(prefix='v')
>>> v.random(output='test', npoints=5)
<grass.experimental.tools.ExecutedTool object at 0x7f5e34d8b8c0>
>>> # let's check if it is there
>>> g = tools.Tools(prefix='g')
>>> g.list(type='vector').text
'test'
>>> # whispering test
>>> v.
v.env v.feed_input_to( v.ignore_errors_of() v.levenshtein_distance( v.no_nonsense_run_from_list( v.parse_command( v.run( v.run_command( v.run_from_list( v.suggest_tools( |
The whisper is not implemented yet, but the bulk of the underlying coding for that is already done for the errors. |
…ect, use use composition instead of inheritance for Tools in StandaloneTools
… a similar function which does not try to do any of the paramater magic
File handling behaviorMy question is if we value more complete consistency between CLI and Python API or we prefer having the best behavior possible in the given context. The API naturally supports the following workflow where some imported data is reused between calls and some data is never exported: tools = StandaloneTools()
tools.r_slope_aspect(elevation="elevation.grr", slope="slope.grr", aspect="aspect")
tools.r_flow(elevation="elevation", aspect="aspect", flowaccumulation="accumulation.grr", flags="3") The
This is great, but slightly inconsistent with the command line behavior. There is no relation between the command line calls, so each call is separate and always has a new fresh project: grass run r.slope.aspect elevation="elevation.grr" slope="slope.grr" aspect="aspect.grr"
grass run r.flow elevation="elevation.grr" aspect="aspect.grr" flowaccumulation="accumulation.grr" -3 The We could make Python API consistent by reducing the "state" aspect of StandaloneTools, and having each function call use a separate session with a fresh project. Then you would always write this: tools = StandaloneTools()
tools.r_slope_aspect(elevation="elevation.grr", slope="slope.grr", aspect="aspect.grr")
tools.r_flow(elevation="elevation.grr", aspect="aspect.grr", flowaccumulation="accumulation.grr", flags="3") In Python, we can make it configurable with parameters of the class (or different classes), for example: tools1 = StandaloneTools(use_one_project=False) # behaves exactly like CLI
tools2 = StandaloneTools(use_one_project=True) # allows for rasters to be reused This approach can allow control over other behaviors, for example, tools = StandaloneTools()
tools.r_slope_aspect(elevation="elevation.grr", slope="slope.grr", aspect="aspect.grr")
# When we see elevation.grr and aspect.grr as inputs in the following function call,
# we will just use the ones we already have.
tools.r_flow(elevation="elevation.grr", aspect="aspect.grr", flowaccumulation="accumulation.grr", flags="3") Even with the behavior being potentially configurable, I prefer going here with the best possible behavior for the context as opposed to complete consistency between the CLI and Python API. So, my choice at this point is to have one session (and one project) for all function calls with one StandaloneTools object. Additionally, the different behavior in Python API does not mean that a user cannot achieve the same with CLI. For the CLI not to have feature parity, despite the different defaults, we could implement something like: grass -c "elevation.grr" "project1"
grass --project "project1" run r.slope.aspect elevation="elevation.grr" slope="slope.grr" aspect="aspect"
grass --project "project1" run r.flow elevation="elevation" aspect="aspect" flowaccumulation="accumulation.grr" -3
rm -r "project1" Do you agree or disagree with having the different default behavior in CLI and in Python API? |
I fully agree with this approach. Exports and imports with each function call give it an undesired overhead. When it comes to the options of configuration - the only option I would see useful is having something like |
So basically it is creating a fluent interface in Python? Like what is often seen in JavaScript, but other languages too (like C#) |
Computational region behaviorThere is more than way one for the computational region to behave when calling multiple tools with the same StandaloneTools object in Python:
1. First input of the first callFirst input of the first call of a tool (function) determines the computational region. Subsequent calls use that region. tools = StandaloneTools()
# The following will take the first input and use it for the computational region.
tools.r_slope_aspect(
elevation="raster_file_3x3.grr",
aspect="file.grr",
flags="a",
)
# Now computational region is set to whatever raster_file_3x3.grr is.
# The following applies the standard GRASS resampling and extent rules.
tools.r_slope_aspect(
elevation="raster_file_4x5.grr",
aspect="file2.grr",
flags="a",
)
# file2.grr has now size 3x3 and if the extents are not overlapping, it contains only nulls. This is the behavior currently implemented. The nice thing is that it allows for not using g.region at all (above) or using it at any point: tools = StandaloneTools()
tools.g_region(raster="raster_file_4x5.grr")
tools.r_slope_aspect(
elevation="raster_file_3x3.grr",
aspect="file.grr",
flags="a",
)
# The output is now 4x5.
tools.g_region(raster="raster_file_3x3.grr")
tools.r_slope_aspect(
elevation="raster_file_4x5.grr",
aspect="file2.grr",
flags="a",
)
# The output is now 3x3. The raster parameter of g.region simply works as expected. I added also checking for any computational region modification based on the underlying WIND file modification time. This way, the current code also supports any g.region parameter or, theoretically, any other tool which would change the region:
I like this option because g.region works at any place as expected, but you can also completely leave it out. So, you can focus on the tools and your data, with some API-specific risks related to not handling extent and resolution explicitly, but if you know about computational region, and want to tap into its power, you can. I'm little less comfortable with inheriting the region from the first call in all the subsequent calls, but I expect this not to be an issue for most of workflows. 2. First input of each callFirst input of the each call of a tool (function) determines the computational region for the given call. Subsequent calls are not influenced by the previous calls. tools = StandaloneTools()
# The following will take the first input and use it for the computational region.
tools.r_slope_aspect(
elevation="raster_file_3x3.grr",
aspect="file.grr",
flags="a",
)
# Now computational region is set to whatever raster_file_3x3.grr is.
# The following will take the first input and use it for the computational region.
tools.r_slope_aspect(
elevation="raster_file_4x5.grr",
aspect="file2.grr",
flags="a",
)
# Now computational region is set to whatever raster_file_4x5.grr is.
# file2.grr has now size 4x5 and its the extent is overlapping with raster_file_4x5.grr. This make the calls completely independent in terms of region. This means that also any g.region calls are ignored. A variation of this could change the behavior based on the computational region changes: If a change would be detected (based on the file modified time like in option 1), the computational region would be respected, otherwise each call would get its own computational region. What I like about this option is that it is clear how each call behaves regardless of their order, and independent calls (with different StandaloneTools objects) give the same result as a series on calls on the same object. It also aligns well with CLI (see below). However, it does not work with g.region, or it would switch behavior on the fly to accommodate it. 3. Manual-only explicitly set regionThe computational region is not set automatically and it defaults to whatever the default is (1x1 at at 0,0 this point). User needs to explicitly call g.region. Subsequent calls use that region. tools = StandaloneTools()
# Set the computation region explicitly.
tools.g_region(raster="raster_file_3x3.grr")
tools.r_slope_aspect(
elevation="raster_file_3x3.grr",
aspect="file.grr",
flags="a",
)
# Now, set the computation region explicitly again if a different one is needed.
tools.g_region(raster="raster_file_4x5.grr")
tools.r_slope_aspect(
elevation="raster_file_4x5.grr",
aspect="file2.grr",
flags="a",
) This works just the way it works now, so any experienced GRASS user will be right at home, but any user still needs to know about the computational region, and each workflow will have at least two steps, computational region setup and the actual tool call. ConfigurationWe need to decide what is the default behavior, but we can also provide configuration for all behaviors, for example: tools = StandaloneTools(region_from_first_call=True) # option 1
tools = StandaloneTools(region_for_each_call=True) # option 2
tools = StandaloneTools(explicit_region_only=True) # option 3 or: tools = StandaloneTools(use_region=False, refresh_region=False) # option 1
tools = StandaloneTools(use_region=False, refresh_region=True) # option 2
tools = StandaloneTools(use_region=True, refresh_region=None) # option 3 Notably, the use of CLISimilarly to the issue with data files, the CLI needs to behave a certain way because the individual calls do not share one object like the Python API does. CLI follows the option 2, each call has its own computational region. # The following will take the first input and use it for the computational region.
grass run r.slope.aspect elevation="raster_file_3x3.grr" aspect="file.grr" -a
# The following will again take the first input and use it for the computational region.
grass run r.slope.aspect elevation="raster_file_4x5.grr" aspect="file.grr" -a With using existing project (similarly to the feature parity in data file handling), we could provide CLI with a project parameter and a set of parameters related to computational region: grass -c "elevation.grr" "project1"
grass --project ~/data/nc --use-region run r.slope.aspect elevation="raster_file_3x3.grr" aspect="file.grr" -a
grass --project ~/data/nc --use-region run r.slope.aspect elevation="raster_file_4x5.grr" aspect="file.grr" -a
rm -r "project1" Bonus: Tracking state of computational regionWhile I used last modified time to track user edits to the computational region, we could support this tracking in the computational region itself. The current trouble is that computational region is tracked in a file called WIND, and this file is created with each mapset, in fact presence of the WIND file is the check used to recognize valid mapsets. While this is nice for tools because they can simply rely on computational region being set (this is happening in the library code, not the tool code itself), the computational region needs to be set before any tool runs, so this is possibly without any input data to provide a reasonable computational region. Later, there is no way of telling whether the values in the computational region are from a user or they are simply the default. The default is 1x1 at 0,0, but should we simply assume legitimate user case for that extent and resolution, and behave differently based on that. We don't do that now. To help the system know what the status is, we could save the status, or here more the provenance, of the computational region to the computational region itself. So the WIND file would have a new key We could take a different approach, and have states based on from what the region was determined, namely determined from The StandaloneTools don't need this. Even generally, this can be done with checking the time stamp or the content. However, this would be a way how to implement same behavior in different places possibly without using the same API. |
If I understand you correctly, the feature makes sense to me, and is already there. StandaloneTools can use an existing session: # No project and no session
gs.create_project("project1")
with gs.setup.init("project1") as session:
tools = StandaloneTools(session=session) # With an existing session, but in a separate mapset.
with grass.experimental.TemporaryMapsetSession() as session:
tools = StandaloneTools(session=session) |
It seems to me that method chaining is heavily present in an fluent interface. I don't use method chaining here because the point is to return data when appropriate which the method changing prevents. Also, the methods here don't modify the object that much which is what the methods in fluent interface do (they modify the project, so more a side effect). Here, the tools object is an interface for functionality. In OOP, this could be a facade, providing a front-face to complex, underlying code, consisting of multiple components. Especially the NumPy piece is trying to achieve more the functional programming ideas than OOP, with the object being a necessary vehicle for providing good interface (tools as function names), cutting some overhead (at minimum the session setup), and possibly allowing for configuration. |
Thanks, it is the latter one. The first one looks terrible to me as it uses both |
Use of NumPy array IO with the standalone tools APIThe combination of NumPy array IO (from #5878) with the standalone tools API (from #5843 - this PR) allows to use tools with NumPy arrays without a project: from grass.experimental.standalone_tools import StandaloneTools
tools = StandaloneTools()
slope = tools.r_slope_aspect(elevation=np.ones((2, 3)), slope=np.ndarray) Complications with computational regionWith how the StandaloneTools are implemented now, the following will fail because the initially set region will be incompatible with the array size in the second call (see option 1 on the region comment above): from grass.experimental.standalone_tools import StandaloneTools
tools = StandaloneTools()
slope1 = tools.r_slope_aspect(elevation=np.ones((2, 3)), slope=np.ndarray)
slope2 = tools.r_slope_aspect(elevation=np.ones((5, 5)), slope=np.ndarray) One way how to avoid it is providing some parameter to StandaloneTools, like from grass.experimental.standalone_tools import StandaloneTools
slope1 = StandaloneTools().r_slope_aspect(elevation=np.ones((2, 3)), slope=np.ndarray)
slope2 = StandaloneTools().r_slope_aspect(elevation=np.ones((5, 5)), slope=np.ndarray) Having the multiple calls and having that instance immediately forgotten does not look that great. Evaluating length of user codeOne could also argue that, in case of NumPy arrays, really plain functions are preferable over calling a tool as a method of an object because even a single call still requires creation of an object beforehand or in the same statement as in these two examples: from grass.experimental.standalone_tools import StandaloneTools
tools = StandaloneTools()
slope = tools.r_slope_aspect(elevation=np.ones((2, 3)), slope=np.ndarray) from grass.experimental.standalone_tools import StandaloneTools
slope = StandaloneTools().r_slope_aspect(elevation=np.ones((5, 5)), slope=np.ndarray) Shortcut object in the libraryWe could create a StandaloneTools object on the Python module level, so that users can import it. This would be similar to grass.pygrass.modules.shortcuts (hence calling it shortcut here). In the library, we would have: # grass/experimental/standalone_tools.py
tools = StandaloneTools(refresh_region=True, keep_data=False, use_one_project=False) And then the user code would be: # myscript.py
from grass.experimental.standalone_tools import tools
slope = tools.r_slope_aspect(elevation=np.ones((5, 5)), slope=np.ndarray) This would exist alongside the option to create one or more StandaloneTools objects, and it would likely have different configuration (independent region, no data preserved, for truly standalone runs). Result would be possible confusion due to another option and some inconsistency, but it might be the best way how to provide such API because it creates simplest user code. |
Building on top of #2923 (not merged), this adds functionality which allows accessing "packed" native GRASS rasters to be used as tool parameters in command line:
The above syntax is not actually implemented, but the code below works:
PYTHONPATH=$(grass --config python-path) python -m grass.app run r.slope.aspect elevation=.../elevation.pack slope=.../slope.pack
The same functionality is also available from Python where it copies the syntax of plain Tools from #2923:
The above syntax does not fully work, but the following one does:
This PR is not meant for merging as is, but currently represents a final combination of all different features proposed. See discussion #5830 for details.