Skip to content

Workflow State for Human Consumption and Automatic Validation #18536

@jmchilton

Description

@jmchilton

Overview

Workflow Format 2 can have the... low-level, verbose tool state format of .ga included (with doubly encoded JSON, etc) (the field name is tool_state on a step) or it can have a newer, simple JSON format of the parameters (the field name is just state on a step). When Galaxy exports a format 2 workflow - it just uses the .ga tool state limiting the ability to read and write these workflows. I would feel much more confident about the future of the format and feel we are delivering more if we could export the workflows with the cleaner format of tool state.

Given the mess of things like workflow modules and parameters/basic.py, I think getting this state import/export right is the messiest piece of the whole picture of having a human readable/writable workflow ecosystem going and have it be fun and efficient to develop against.

I'd like to spend server weeks tackling this task really solidly building on the work in #18524. I think the steps look something like...

  1. write a tool state/pydantic validator for workflow tool state (using ideas and meta model implemented in Add Tool-Centric APIs to the Tool Shed 2.0 #18524)
  2. implement a workflow export mode that exports gxformat2 with a clean state instead of tool_state if the result validates against our tool state model for workflows
  3. ingest all of the IWC workflows and convert the tool_state to state and verify the workflow tests still pass
  4. once we're confident this works well for a variety of complex workflows - make the default when exporting format 2 workflows

After that is done, we will have the tooling for clean workflow exports and we will have confidence to know it is working well with a variety of complex workflows. Additionally, the models in #18524 can be produces from a toolbox in Galaxy context or from the tool shed outside the context of a Galaxy (e.g. gxformat2, planemo, IDEs, etc...) so we will be able to readily translate this workflow validation we do for our confidentio into CLI tooling (Planemo) and the galaxy-language-server that will let users catch typing issues, connection issues, etc... instantly.

Workflow validation

This is a bit more of a deep dive on how I vaguely think I will implement (1) above. Our meta model has a variety of exports to create tool state validation for different contexts. I've been using a UML diagram to describe them and I have updated it to include two new modes - "workflow_step" and "workflow_step_linked".

tool_state_state_classes plantuml

The workflow_step validator would validate the "step" state directly in YAML as is. This isn't super helpful as most things can be empty (except conditional discriminators) but it would catch some problems without any additional work. The second model is one that I would apply after I take the state and insert "descriptions" of some form for default and source values defined as step inputs. Maybe the model would expect a {"class": "link", "source": "XXXX", "source_type": "data"} or something like that. So I would validate the literal representation with the first model, preprocess, and validate the "linked" version with second model.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions