How to deal with optional values and channels #129
Pinned
AlexVCaron
started this conversation in
Ideas
Replies: 1 comment
-
|
@ThoumyreStanislas @gagnonanthony @Manonedde I want to start a discussion. The first post acts more as a brain dump on my part. I'd like us to find a way to |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
As we start developing workflows and using channels, we now hit the problem of having to play with
optionalityand deal correctly with channels or objects in them that are optional. I tend to always look the way that will make it the easiest for the end-user, as we are the experts and should deal with the complications and code repetitions, while still limiting them as much as possible.So now I want to start this discussion, so we can decide the best ways to take about
optionalityin the code base. I'd like to explore a few ways to deal with it, and have your ideas and experiences on the matter. To best orient the discussion, I'd like to divide it intwo parts:Optionality in
Dataflows. This happens insidechannels, when they containtuples(practically all the time in our case). At an index, the data will be missing. We detect it and patch it with avalid empty valueso a module that consumes it still works.Optionality in
Workflows: This happens insidepipelinesandsubworkflows, ingroovy. Achannelis missing, identified by anullvalue (though any value that evaluates tofalseingroovydoes the trick). We detect it and consider its absence when doing operations onchannels, e.g. either by addingempty valuesin a channel instead of joining or usingif/elseconditions.There is good in doing the logic in either of those two parts. For now, we've mostly been using the
second way, using conditions to either join a channel or, if absent, map empty values. It's a good way, but becomes super heavy with increasing count of optional channels.I am a big proponent of the
first wayof doing it. With it, we can achieve alloptionalitylogic inside channels, and replace allif/elseconditions withfancyuses of thejoinoperator and it's behavior withoutliers. With this logic, anoptional channelis eitherempty(Channel.empty()), contains a subset only of datapoints (or ids) or contains tuples withmissing indexes. We don't check it with conditions anymore, since it doesn't work. Instead, we usejoinwithremainer: true, which sets theoutlierswithnullentries. Then, we filter thisjoined channelto replace thenull valueswith validempty values.I've done it quite a lot in
versaFlow. The preprocess workflow has only conditions onstep options, the channels are prepared foroptionalityusing a reference id channel and two functions filter_datapoints and fill_missing_datapoints.filter_datapointsfilters a channel based on an inputclosure(the equivalent of a pythonlambain groovy) and returns a set ofids(ormeta) that abides to it.fill_missing_datapointstakes a channel, a list ofidsthat should be present, anindexwhere thevaluesshould be missing and afill_valueto use there. It then checks all outliersidsand patches them with thefill_valueat the givenindexif needed. Typical use-case :It is not implemented in the best way possible, we can brainstorm on that and develop a solution better for
nf-scil.Beta Was this translation helpful? Give feedback.
All reactions