-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Refactor/cleanup admin_cleanup_datasets.py script #20819
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
…more readable but mainly compatible with sqlalchemy 2.x
…compatible with sqlalchemy 2.x
… and the actual deletion email
|
Can you run "make format" - this looks solid to me but the linter is unhappy about code formatting. |
| select(HDA.id) | ||
| .join(Dataset, Dataset.id == HDA.dataset_id, isouter=True) | ||
| .where(and_( | ||
| Dataset.deleted.is_(False), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deleted.is_(False) is identical to deleted == false(). In the rest of the codebase we use the former. We use the is_ construct when comparing to null. Let's keep false() for consistency.
But replacing Foo.__table__.c.bar with Foo.bar is correct.
| session = app.sa_session | ||
|
|
||
| # Aliases for ORM‑mapped classes | ||
| HDA = aliased(app.model.HistoryDatasetAssociation) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need aliased classes here. If you want to improve the readability of the code, you can import the classes from the model and then use just the class names:
from galaxy.model import HistoryDatasetAssociation
my_statement = select(HistoryDatasetAssociation).where(whatever...)An extra benefit of this is that reading (and grepping) the code is easier: the string HistoryDatasetAssociation will always represent the same thing across the codebase.
|
|
||
|
|
||
| def _get_tool_id_for_hda(app, hda_id): | ||
| # TODO Some datasets don't seem to have an entry in jtod or a copied_from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not delete this: this is a potentially useful comment which, maybe, hasn't been addressed yet. Yes, it's 12 years old, but it still could be helpful. (jtod and copied_from refer to database tables). Would be OK to delete if this particular item were addressed and deemed no longer relevant.
|
|
||
| hda = session.get(app.model.HistoryDatasetAssociation, hda_id) | ||
| if hda is None: | ||
| return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This changes the function's behavior: the previous version would raise an error here, which is correct.
Also, we don't need to check that an hda exists here.
|
|
||
| job_query = ( | ||
| select(Job.tool_id) | ||
| .join(JTODA, JTODA.job_id == Job.id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to specify the join criteria here: SQLAlchemy takes care of it for us.
| ) | ||
| .select_from(sa.outerjoin(model.Dataset.__table__, model.HistoryDatasetAssociation.__table__)) | ||
| select(HDA.id) | ||
| .join(Dataset, Dataset.id == HDA.dataset_id, isouter=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need the join criteria.
Very nice refactoring here - thanks!
| ) | ||
| ) | ||
| # Bind hda_id for current iteration | ||
| rows = session.execute( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above - this is very nice refactoring, thanks!
|
@jmchilton Yes, will do. ... once I get back from vacation in a week! |
Update the
admin_cleanup_datasets.pyscript to work with SQLAlchemy 2.x. Plus a few little updates.Changes:
administrative_delete_datasetsfunction to be compatible with SA 2.x, and easier to the eye [860e2cd]_get_tool_idfunction to be compatible with SA 2.x and easier to the eye [651426a]How to test the changes?
(Select all options that apply)
$GALAXY_CONFIG_FILEto the galaxy config, activate the virtual environment, move to the<root>/server/scripts/cleanup_datasetsdirectorypython admin_cleanup_datasets.py --days 0 --info_only --config $GALAXY_CONFIG_FILE --template admin_cleanup_warning_template.txt.samplepython admin_cleanup_datasets.py --days 0 --info_only --config $GALAXY_CONFIG_FILE --template admin_cleanup_warning_template.txt.sample --tool_id <id>python admin_cleanup_datasets.py --days 0 --email_only --config $GALAXY_CONFIG_FILE --template admin_cleanup_warning_template.txt.sample --tool_id <id>and check for the email containing the datasets from the tool that generated thempython admin_cleanup_datasets.py --days 0 --config $GALAXY_CONFIG_FILE --template admin_cleanup_warning_template.txt.sample --tool_id <id>and check for deleted datasets