You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using S3PickleIOManager, how do I set the name of the pickled object at output?
Subsequent materializations of an asset will overwrite previous materializations of that asset. With a base directory of "/my/base/path", an asset with key AssetKey(["one", "two", "three"]) would be stored in a file called "three" in a directory with path "/my/base/path/one/two/".
I want to re-materilise downstream assets using previous versions of upstream assets, but since an upstream asset will have been overwritten, this won't be possible. For example, if I wanted to examine the effect on the downstream asset, due to using a different sample size or subset of the upstream asset.
I currently hash an upstream DataFrame asset and add it's hash as DataVersion metadata. Along the lines of:
I am thinking to somehow pass the exact hash value string into run configs based on dg.Config, and I can have some other function that loads a past upstream asset name. I can't seem to find in the docs some way to override the name of the pickled artefact. I have a hunch I'd have to implement my own IO manager based on S3PickleIOManager, but it's not very clear how to add the output data_version to the output key.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
When using S3PickleIOManager, how do I set the name of the pickled object at output?
I want to re-materilise downstream assets using previous versions of upstream assets, but since an upstream asset will have been overwritten, this won't be possible. For example, if I wanted to examine the effect on the downstream asset, due to using a different sample size or subset of the upstream asset.
I currently hash an upstream DataFrame asset and add it's hash as DataVersion metadata. Along the lines of:
I am thinking to somehow pass the exact hash value string into run configs based on
dg.Config
, and I can have some other function that loads a past upstream asset name. I can't seem to find in the docs some way to override the name of the pickled artefact. I have a hunch I'd have to implement my own IO manager based on S3PickleIOManager, but it's not very clear how to add the output data_version to the output key.Beta Was this translation helpful? Give feedback.
All reactions