Skip to content

[SPARK-51187][SQL][SS][4.0] Implement the graceful deprecation of incorrect config introduced in SPARK-49699 #49984

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 10 commits into from

Conversation

HeartSaVioR
Copy link
Contributor

What changes were proposed in this pull request?

This PR proposes to implement the graceful deprecation of incorrect config introduced in SPARK-49699.

SPARK-49699 was included in Spark 3.5.4, hence we can't simply rename to fix the issue.

Also, since the incorrect config is logged in offset log in streaming query, the fix isn't just easy like adding withAlternative and done. We need to manually handle the case where offset log contains the incorrect config, and set the value of incorrect config in the offset log into the new config. Once a single microbatch has planned after the restart (hence the above logic is applied), offset log will contain the "new" config and it will no longer refer to the incorrect config.

That said, we can remove the incorrect config in the Spark version which we are confident that there will be no case users will upgrade from Spark 3.5.4 to that version.

Why are the changes needed?

We released an incorrect config and we want to rename it properly. While renaming, we don't also want to have any breakage on the existing streaming query.

Does this PR introduce any user-facing change?

No. That is what this PR is aiming for.

How was this patch tested?

New UT.

Was this patch authored or co-authored using generative AI tooling?

No.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dongjoon-hyun
Copy link
Member

I sent an email for further discussion

@github-actions github-actions bot added the DOCS label Feb 23, 2025
@dongjoon-hyun dongjoon-hyun dismissed their stale review March 13, 2025 01:09

According to the community vote, this becomes stale.

@Kimahriman
Copy link
Contributor

Kimahriman commented Mar 17, 2025

Does that mean your issue with the PR is stale and withdrawn, or the PR itself is now stale? If the former, great, if not, what if a regular expression was just used instead, something like

    metadata.conf.foreach { case (key, value) =>
      if ("^spark\\.[^.]+\\.sql\\.optimizer\\.pruneFiltersCanPruneStreamingSubplan$".r.matches(key)) {
        sessionConf.setConfString(PRUNE_FILTERS_CAN_PRUNE_STREAMING_SUBPLAN.key, value)
      }
    })

then everyone can move on with their lives

@HeartSaVioR
Copy link
Contributor Author

HeartSaVioR commented Mar 17, 2025

This is coupled with an active discussion - I posted the alternative like you proposed in dev@, but the original discussion is still in place, and everyone's focus is there rather than my alternative.

@cloud-fan
Copy link
Contributor

@HeartSaVioR can you rebase this PR to trigger the tests? I think it's time to merge it now and unblock 4.0

@HeartSaVioR
Copy link
Contributor Author

HeartSaVioR commented Mar 19, 2025

@cloud-fan
I will rebase this PR, but probably this PR #50314 would be more proper to address community's concern. The VOTE has passed, but Mark still suggests us to consider the workaround, since this will resolve the main concern.

It is definitely your call as a release manager of Spark 4.0.0, to decide which PR you want to merge and unblock releasing Spark 4.0.0. I sincerely respect your decision.

@cloud-fan
Copy link
Contributor

TBH I don't think it's an issue to mention vendor names in internal migration code paths, and this is not the only place. It's better to avoid mistakes in the first place so that we don't need the migration path, and we've already added a new style check rule to prevent improper config names.

I'm fine if people think it's an issue. We can scan the whole codebase and fix all these vendor names in the master branch. For 4.0 I'd like to merge this PR as it is.

@cloud-fan
Copy link
Contributor

thanks, merging to 4.0!

cloud-fan pushed a commit that referenced this pull request Mar 19, 2025
…cation of incorrect config introduced in

### What changes were proposed in this pull request?

This PR proposes to implement the graceful deprecation of incorrect config introduced in SPARK-49699.

SPARK-49699 was included in Spark 3.5.4, hence we can't simply rename to fix the issue.

Also, since the incorrect config is logged in offset log in streaming query, the fix isn't just easy like adding withAlternative and done. We need to manually handle the case where offset log contains the incorrect config, and set the value of incorrect config in the offset log into the new config. Once a single microbatch has planned after the restart (hence the above logic is applied), offset log will contain the "new" config and it will no longer refer to the incorrect config.

That said, we can remove the incorrect config in the Spark version which we are confident that there will be no case users will upgrade from Spark 3.5.4 to that version.

### Why are the changes needed?

We released an incorrect config and we want to rename it properly. While renaming, we don't also want to have any breakage on the existing streaming query.

### Does this PR introduce _any_ user-facing change?

No. That is what this PR is aiming for.

### How was this patch tested?

New UT.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #49984 from HeartSaVioR/SPARK-51187-4.0.

Authored-by: Jungtaek Lim <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
@cloud-fan cloud-fan closed this Mar 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants