StreamingDataFrame: retain a custom stream_id across operations #925

daniil-quix · 2025-06-09T14:53:12Z

Problem

In #836, a custom stream_id parameter was introduced to the StreamingDataFrame class and its __dataframe_clone__ method; however, calling __dataframe_clone__ again reset the stream_id back to the default value obtained from the underlying topics.

The stream_id is used as part of the State stores' names, and it wasn't propagated correctly, leading to incorrect store names in some rare cases.
This PR corrects that, but the state stores created after .filter() or .apply() operations on the grouped DataFrame won't be accessible anymore.

:

topic = app.topic('<some-topic-with-one-partition>')

sdf = StreamingDataFrame(topic)  # stream_id = "<some-topic-with-one-partition>"

sdf_grouped = sdf.group_by("column")  # stream_id = "column--groupby--<some-topic-with-one-partition>"

# The store gets registered for the stream_id "column--groupby--<some-topic-with-one-partition>"
sdf_windowed = sdf_grouped.tumbling_window(...).sum().final()

# But here the state was registered under the stream_id "<some-topic-with-one-partition>".
# The correct stream_id is  "column--groupby--<some-topic-with-one-partition>"
sdf_windowed.apply(..., stateful=True)

Solution

Pass stream_id to the cloned dataframes
Update StreamingDataFrame.concat() to generate a new stream_id when concatenating branches with different stream_ids (possible when concatenating the group_by-ed dataframe with a one-partition topic

quixstreams/utils/stream_id.py

tests/test_quixstreams/test_dataframe/test_dataframe.py

Co-authored-by: Remy Gwaramadze <[email protected]>

…io#925) Co-authored-by: Remy Gwaramadze <[email protected]>

StreamingDataFrame: retain a custom stream_id across operations

53a6ef4

gwaramadze reviewed Jun 10, 2025

View reviewed changes

daniil-quix and others added 3 commits June 10, 2025 12:36

Update tests/test_quixstreams/test_dataframe/test_dataframe.py

c0f1ab5

Co-authored-by: Remy Gwaramadze <[email protected]>

Update tests/test_quixstreams/test_dataframe/test_dataframe.py

a6036a2

Co-authored-by: Remy Gwaramadze <[email protected]>

Get the todo back

5c04b65

gwaramadze approved these changes Jun 10, 2025

View reviewed changes

daniil-quix merged commit b50675e into main Jun 10, 2025
4 checks passed

daniil-quix deleted the fix/groupby-stream_id branch June 10, 2025 11:19

jbrass pushed a commit to jbrass/quix-streams that referenced this pull request Jun 10, 2025

StreamingDataFrame: retain a custom stream_id across operations (quix…

d79cb2e

…io#925) Co-authored-by: Remy Gwaramadze <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

StreamingDataFrame: retain a custom stream_id across operations #925

StreamingDataFrame: retain a custom stream_id across operations #925

Uh oh!

daniil-quix commented Jun 9, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

StreamingDataFrame: retain a custom stream_id across operations #925

StreamingDataFrame: retain a custom stream_id across operations #925

Uh oh!

Conversation

daniil-quix commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

daniil-quix commented Jun 9, 2025 •

edited

Loading