-
Notifications
You must be signed in to change notification settings - Fork 95
Pipeline state dump and load #352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Pipeline state dump and load #352
Conversation
.. code:: python | ||
|
||
# Run pipeline until a specific component | ||
state = await pipeline.run_until(data, stop_after="component_name", state_file="state.json") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are returning the state, I think it would be cleaner let to the user save it to file or not, I'm not sure adding the state_file
option is helpful.
result = await pipeline.resume_from(state, data, start_from="component_name") | ||
|
||
# Alternatively, load state from file | ||
result = await pipeline.resume_from(None, data, start_from="component_name", state_file="state.json") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also because of that, None
as first argument is not super nice IMHO :)
keys_to_remove = [ | ||
key for key in self._data.keys() if key.startswith(run_id_prefix) | ||
] | ||
for key in keys_to_remove: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So here we are removing all results from a previous run with this run_id, right?
@@ -140,6 +140,7 @@ def __init__( | |||
} | |||
""" | |||
self.missing_inputs: dict[str, list[str]] = defaultdict() | |||
self._current_run_id: Optional[str] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can not be saved in the Pipeline instance, since concurrent runs will override it.
@@ -600,20 +650,23 @@ async def run( | |||
result=await self.get_final_results(orchestrator.run_id), | |||
) | |||
|
|||
def dump_state(self, run_id: str) -> Dict[str, Any]: | |||
def dump_state(self) -> Dict[str, Any]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we will need the run_id as parameter here as a consequence of my first comment
Description
This PR introduces state management capabilities to the
Pipeline
class, enabling:run_until
resume_from
The state includes pipeline configuration, execution results, and final results from previous runs. This feature is particularly useful for:
Type of Change
Complexity
Complexity: low
How Has This Been Tested?
Checklist
The following requirements should have been met (depending on the changes in the branch):