StreamingDataFrame: retain a custom stream_id across operations #925
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
In #836, a custom
stream_id
parameter was introduced to theStreamingDataFrame
class and its__dataframe_clone__
method; however, calling__dataframe_clone__
again reset the stream_id back to the default value obtained from the underlying topics.The stream_id is used as part of the State stores' names, and it wasn't propagated correctly, leading to incorrect store names in some rare cases.
This PR corrects that, but the state stores created after
.filter()
or.apply()
operations on the grouped DataFrame won't be accessible anymore.:
Solution
stream_id
to the cloned dataframesStreamingDataFrame.concat()
to generate a new stream_id when concatenating branches with different stream_ids (possible when concatenating the group_by-ed dataframe with a one-partition topic