Skip to content

Conversation

@ccoulombe
Copy link
Contributor

@ccoulombe ccoulombe commented Aug 26, 2025

Update the admin_cleanup_datasets.py script to work with SQLAlchemy 2.x. Plus a few little updates.

Changes:

  • Nicer time format [b6f5553]
  • Update to work with SA 2.x [3904721]
  • Refactored and updated the administrative_delete_datasets function to be compatible with SA 2.x, and easier to the eye [860e2cd]
  • Refactored and updated the _get_tool_id function to be compatible with SA 2.x and easier to the eye [651426a]
  • Add state of deletion to email subject [f0e63ff]
  • Add option to not send the email upon deletion [85d8e6a]

How to test the changes?

(Select all options that apply)

  • I've included appropriate automated tests.
  • This is a refactoring of components with existing test coverage.
  • Instructions for manual testing are as follows:
    1. On the galaxy server, export $GALAXY_CONFIG_FILE to the galaxy config, activate the virtual environment, move to the <root>/server/scripts/cleanup_datasets directory
    2. Create dummy datasets, and create one with a specific tool
    3. Run python admin_cleanup_datasets.py --days 0 --info_only --config $GALAXY_CONFIG_FILE --template admin_cleanup_warning_template.txt.sample
    4. Run python admin_cleanup_datasets.py --days 0 --info_only --config $GALAXY_CONFIG_FILE --template admin_cleanup_warning_template.txt.sample --tool_id <id>
    5. Run python admin_cleanup_datasets.py --days 0 --email_only --config $GALAXY_CONFIG_FILE --template admin_cleanup_warning_template.txt.sample --tool_id <id> and check for the email containing the datasets from the tool that generated them
    6. Run python admin_cleanup_datasets.py --days 0 --config $GALAXY_CONFIG_FILE --template admin_cleanup_warning_template.txt.sample --tool_id <id> and check for deleted datasets
    7. Run both versions of the scripts and compare their output

@github-actions github-actions bot added this to the 25.1 milestone Aug 26, 2025
@ccoulombe ccoulombe changed the title Refactor/cleanup datasets script Refactor/cleanup admin_cleanup_datasets.py script Aug 26, 2025
@jmchilton
Copy link
Member

Can you run "make format" - this looks solid to me but the linter is unhappy about code formatting.

@jdavcs jdavcs self-requested a review August 28, 2025 13:52
select(HDA.id)
.join(Dataset, Dataset.id == HDA.dataset_id, isouter=True)
.where(and_(
Dataset.deleted.is_(False),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleted.is_(False) is identical to deleted == false(). In the rest of the codebase we use the former. We use the is_ construct when comparing to null. Let's keep false() for consistency.

But replacing Foo.__table__.c.bar with Foo.bar is correct.

session = app.sa_session

# Aliases for ORM‑mapped classes
HDA = aliased(app.model.HistoryDatasetAssociation)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need aliased classes here. If you want to improve the readability of the code, you can import the classes from the model and then use just the class names:

from galaxy.model import HistoryDatasetAssociation
my_statement = select(HistoryDatasetAssociation).where(whatever...)

An extra benefit of this is that reading (and grepping) the code is easier: the string HistoryDatasetAssociation will always represent the same thing across the codebase.



def _get_tool_id_for_hda(app, hda_id):
# TODO Some datasets don't seem to have an entry in jtod or a copied_from
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not delete this: this is a potentially useful comment which, maybe, hasn't been addressed yet. Yes, it's 12 years old, but it still could be helpful. (jtod and copied_from refer to database tables). Would be OK to delete if this particular item were addressed and deemed no longer relevant.


hda = session.get(app.model.HistoryDatasetAssociation, hda_id)
if hda is None:
return None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes the function's behavior: the previous version would raise an error here, which is correct.
Also, we don't need to check that an hda exists here.


job_query = (
select(Job.tool_id)
.join(JTODA, JTODA.job_id == Job.id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to specify the join criteria here: SQLAlchemy takes care of it for us.

)
.select_from(sa.outerjoin(model.Dataset.__table__, model.HistoryDatasetAssociation.__table__))
select(HDA.id)
.join(Dataset, Dataset.id == HDA.dataset_id, isouter=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need the join criteria.

Very nice refactoring here - thanks!

)
)
# Bind hda_id for current iteration
rows = session.execute(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above - this is very nice refactoring, thanks!

@ccoulombe
Copy link
Contributor Author

@jmchilton Yes, will do.
@jdavcs Thanks for the comments, will tackle them.

... once I get back from vacation in a week!

@ahmedhamidawan ahmedhamidawan added the kind/refactoring cleanup or refactoring of existing code, no functional changes label Sep 23, 2025
@ahmedhamidawan ahmedhamidawan modified the milestones: 25.1, 26.0 Sep 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/scripts kind/refactoring cleanup or refactoring of existing code, no functional changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants