Skip to content

Update table compute - Change rangemaker logic to allow pandas like indexing #7105

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

paulzierep
Copy link
Contributor

@paulzierep paulzierep commented Jun 30, 2025

FOR CONTRIBUTOR:

  • I have read the CONTRIBUTING.md document and this tool is appropriate for the tools-iuc repo.
  • License permits unrestricted use (educational + commercial)
  • This PR adds a new tool or tool collection
  • This PR updates an existing tool or tool collection
  • This PR does something else (explain below)

Fix #5783
This was annoying me for a long time, maybe it can be fixed unit I am back in September ;)

@wm75
Copy link
Member

wm75 commented Jun 30, 2025

I like this! Before I start reviewing it more carefully: is there any functionality that you'll loose with this change? Doesn't look like from a first glance, but just to make sure :-)

Other question: when we're moving that close to actual pandas syntax, is it worthwhile keeping the 1-based positive indexing, or would it be better to switch to pandas-style completely?

@paulzierep
Copy link
Contributor Author

paulzierep commented Jun 30, 2025

I like this! Before I start reviewing it more carefully: is there any functionality that you'll loose with this change? Doesn't look like from a first glance, but just to make sure :-)

Other question: when we're moving that close to actual pandas syntax, is it worthwhile keeping the 1-based positive indexing, or would it be better to switch to pandas-style completely?

I thought the same, usually we use 1-based indexing for most column-related tools in Galaxy, that was the reason, otherwise we can keep pandas / python based indexing. We could make it optionally, but that might confuse more than help. I really just want to get rid of the current indexing nonsense.

Copy link
Member

@wm75 wm75 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<requirement type="package" version="1.2.4">pandas</requirement>

Looks like this should reuse @VERSION@

Examples:
"1:3" → [0, 1, 2]
"3:1" → [2, 1, 0]
"2:-2" → [1, 0, -1, -2]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't that the old behavior instead of the one you're aiming for?

"-3:-1" → [-3, -2, -1]
"2" → [1]
"1,1,3:1" → [0, 0, 2, 1, 0]
"1:-3" → [0, -1, -2, -3]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here?

@wm75
Copy link
Member

wm75 commented Jun 30, 2025

I think the change also deserves highlighting somewhere. Otherwise this can be a very unpleasant surprise for people updating, say, a workflow to use the new tool version and suddenly performing calculations on the wrong columns.
Probably a warning box at the very top of the tool help?

@bgruening do you think that would be enough? or a banner? but then that's only visible on .eu, or is it a bad idea after all to have such behavior change in the tool?

@bgruening
Copy link
Member

I had hoped to stay out of this :)

What about changing the param name as well ... this way people are forced to look more closely at the param. We can then also add a warning to the param and the help section as well. Which we remove in a few versions.

@wm75
Copy link
Member

wm75 commented Jun 30, 2025

That's a good idea! It will cause many false alerts for people who aren't using slices with negative indices, but better this way than introducing silent bugs I think.

@paulzierep
Copy link
Contributor Author

I had hoped to stay out of this :)

What about changing the param name as well ... this way people are forced to look more closely at the param. We can then also add a warning to the param and the help section as well. Which we remove in a few versions.

You mean Name or Label ? So changing what the user sees, or actually changing the name ?

@paulzierep
Copy link
Contributor Author

Will make an update with the correct logic today, still up to discussion is if we should use 0-based indexing, then people can fully rely on pandas based indexing ? Maybe this can be part of the hackathon ? I do not mind either way as long as we inform the user.

@wm75
Copy link
Member

wm75 commented Jul 1, 2025

changing the actual parameter name so that tool updates inside a workflow will break connections.

@paulzierep
Copy link
Contributor Author

changing the actual parameter name so that tool updates inside a workflow will break connections.

ok now I got the logic

@wm75
Copy link
Member

wm75 commented Jul 1, 2025

wait, another idea, maybe silly:

what if you'd disallow slices with negative stops in the current version, but introduce alternative rust-like syntax for slices using .. as the start..stop separator, and in that new alternative syntax, and only there, allow negative stops?

That way users that don't use expressions affected by the change could update to the new version without hassle, but if you are using them you have to conciously change to the new syntax.

@wm75
Copy link
Member

wm75 commented Jul 1, 2025

and one more note: your changes introduce a new inconsistency.
While 1:-2 now behaves like pandas df.iloc, -2:2 still wraps around in forward direction creating a new asymmetry.
If you want to be consistent with iloc, you need to disallow cases where left < 0, but right >=0.

@wm75
Copy link
Member

wm75 commented Jul 1, 2025

wait, another idea, maybe silly:

what if you'd disallow slices with negative stops in the current version, but introduce alternative rust-like syntax for slices using .. as the start..stop separator, and in that new alternative syntax, and only there, allow negative stops?

That way users that don't use expressions affected by the change could update to the new version without hassle, but if you are using them you have to conciously change to the new syntax.

Thinking about this more, a good way forward might be:

  1. leave the current : slice syntax as it is entirely and mention it as legacy in the parameter help
  2. introduce new, alternative .. syntax and implement the new behavior just for this one

This way, no exisiting usage breaks, and there are no surprises either. The new implementation can then also use 0-based indexing.

@paulzierep
Copy link
Contributor Author

wait, another idea, maybe silly:
what if you'd disallow slices with negative stops in the current version, but introduce alternative rust-like syntax for slices using .. as the start..stop separator, and in that new alternative syntax, and only there, allow negative stops?
That way users that don't use expressions affected by the change could update to the new version without hassle, but if you are using them you have to conciously change to the new syntax.

Thinking about this more, a good way forward might be:

  1. leave the current : slice syntax as it is entirely and mention it as legacy in the parameter help
  2. introduce new, alternative .. syntax and implement the new behavior just for this one

This way, no exisiting usage breaks, and there are no surprises either. The new implementation can then also use 0-based indexing.

I agree, the best solution without breaking stuff but also allowing new logic would be the introduction of a new parameter with pandas based slicing. Since I am out of time before parental leave, I would propose this as a nice project for the tools' hackathon. In my opinion, I would completely revert my changes to rangemaker and just use the 0-based pandas iloc logic directly as the other parameter. This way users can apply pandas slicing, for which there are plenty of docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve/Fix slices in Table Compute
3 participants