-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-51187][SQL][SS][4.0] Implement the graceful deprecation of incorrect config introduced in SPARK-49699 #49984
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…anPruneStreamingSubplan" This reverts commit decb677.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is my alternative proposal.
I sent an email for further discussion |
According to the community vote, this becomes stale.
Does that mean your issue with the PR is stale and withdrawn, or the PR itself is now stale? If the former, great, if not, what if a regular expression was just used instead, something like metadata.conf.foreach { case (key, value) =>
if ("^spark\\.[^.]+\\.sql\\.optimizer\\.pruneFiltersCanPruneStreamingSubplan$".r.matches(key)) {
sessionConf.setConfString(PRUNE_FILTERS_CAN_PRUNE_STREAMING_SUBPLAN.key, value)
}
}) then everyone can move on with their lives |
This is coupled with an active discussion - I posted the alternative like you proposed in dev@, but the original discussion is still in place, and everyone's focus is there rather than my alternative. |
@HeartSaVioR can you rebase this PR to trigger the tests? I think it's time to merge it now and unblock 4.0 |
@cloud-fan It is definitely your call as a release manager of Spark 4.0.0, to decide which PR you want to merge and unblock releasing Spark 4.0.0. I sincerely respect your decision. |
TBH I don't think it's an issue to mention vendor names in internal migration code paths, and this is not the only place. It's better to avoid mistakes in the first place so that we don't need the migration path, and we've already added a new style check rule to prevent improper config names. I'm fine if people think it's an issue. We can scan the whole codebase and fix all these vendor names in the master branch. For 4.0 I'd like to merge this PR as it is. |
thanks, merging to 4.0! |
…cation of incorrect config introduced in ### What changes were proposed in this pull request? This PR proposes to implement the graceful deprecation of incorrect config introduced in SPARK-49699. SPARK-49699 was included in Spark 3.5.4, hence we can't simply rename to fix the issue. Also, since the incorrect config is logged in offset log in streaming query, the fix isn't just easy like adding withAlternative and done. We need to manually handle the case where offset log contains the incorrect config, and set the value of incorrect config in the offset log into the new config. Once a single microbatch has planned after the restart (hence the above logic is applied), offset log will contain the "new" config and it will no longer refer to the incorrect config. That said, we can remove the incorrect config in the Spark version which we are confident that there will be no case users will upgrade from Spark 3.5.4 to that version. ### Why are the changes needed? We released an incorrect config and we want to rename it properly. While renaming, we don't also want to have any breakage on the existing streaming query. ### Does this PR introduce _any_ user-facing change? No. That is what this PR is aiming for. ### How was this patch tested? New UT. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #49984 from HeartSaVioR/SPARK-51187-4.0. Authored-by: Jungtaek Lim <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
This PR proposes to implement the graceful deprecation of incorrect config introduced in SPARK-49699.
SPARK-49699 was included in Spark 3.5.4, hence we can't simply rename to fix the issue.
Also, since the incorrect config is logged in offset log in streaming query, the fix isn't just easy like adding withAlternative and done. We need to manually handle the case where offset log contains the incorrect config, and set the value of incorrect config in the offset log into the new config. Once a single microbatch has planned after the restart (hence the above logic is applied), offset log will contain the "new" config and it will no longer refer to the incorrect config.
That said, we can remove the incorrect config in the Spark version which we are confident that there will be no case users will upgrade from Spark 3.5.4 to that version.
Why are the changes needed?
We released an incorrect config and we want to rename it properly. While renaming, we don't also want to have any breakage on the existing streaming query.
Does this PR introduce any user-facing change?
No. That is what this PR is aiming for.
How was this patch tested?
New UT.
Was this patch authored or co-authored using generative AI tooling?
No.