Skip to content

[SPARK-56189][PYTHON] Refactor SQL_WINDOW_AGG_ARROW_UDF#55153

Closed
Yicong-Huang wants to merge 1 commit intoapache:masterfrom
Yicong-Huang:refactor/window-agg-arrow-udf
Closed

[SPARK-56189][PYTHON] Refactor SQL_WINDOW_AGG_ARROW_UDF#55153
Yicong-Huang wants to merge 1 commit intoapache:masterfrom
Yicong-Huang:refactor/window-agg-arrow-udf

Conversation

@Yicong-Huang
Copy link
Copy Markdown
Contributor

@Yicong-Huang Yicong-Huang commented Apr 2, 2026

What changes were proposed in this pull request?

Refactor SQL_WINDOW_AGG_ARROW_UDF to be self-contained in read_udfs(), moving bounded/unbounded window logic from wrapper functions and the old mapper into a single execution block that uses ArrowStreamGroupSerializer as pure I/O.

This is a re-submission of #55123 which was reverted due to CI failure caused by using a non-existent num_dfs parameter on ArrowStreamSerializer. This version uses ArrowStreamGroupSerializer instead.

Why are the changes needed?

Part of SPARK-55388.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing tests.

ASV micro-benchmarks with repeat=(3, 5) show no regression:

SQL_WINDOW_AGG_ARROW_UDF (time)

Scenario UDF Before After Change
few_groups_sm sum 8.73±0.06ms 8.70±0.09ms ~neutral
few_groups_sm mean_multi 7.70±0.2ms 7.82±0.1ms ~neutral
few_groups_lg sum 31.6±0.2ms 31.6±0.5ms ~neutral
few_groups_lg mean_multi 29.6±0.2ms 29.6±0.4ms ~neutral
many_groups_sm sum 244±4ms 241±4ms ~neutral
many_groups_sm mean_multi 208±4ms 206±2ms ~neutral
many_groups_lg sum 135±0.4ms 134±0.3ms ~neutral
many_groups_lg mean_multi 124±2ms 120±0.6ms -3%
wide_cols sum 72.7±0.1ms 69.4±2ms -5%
wide_cols mean_multi 70.0±0.9ms 69.9±0.3ms ~neutral

Peak memory: No change (468M-506M for all scenarios).

Was this patch authored or co-authored using generative AI tooling?

No.

@Yicong-Huang Yicong-Huang force-pushed the refactor/window-agg-arrow-udf branch from d2a49bd to dcb723c Compare April 2, 2026 04:14
@Yicong-Huang
Copy link
Copy Markdown
Contributor Author

retest this please

@zhengruifeng
Copy link
Copy Markdown
Contributor

merged to master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants