Skip to content

[SPARK-51187][SQL][SS] Implement the graceful deprecation of incorrect config introduced in SPARK-49699 #49983

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from

Conversation

HeartSaVioR
Copy link
Contributor

@HeartSaVioR HeartSaVioR commented Feb 17, 2025

What changes were proposed in this pull request?

This PR proposes to implement the graceful deprecation of incorrect config introduced in SPARK-49699.

SPARK-49699 was included in Spark 3.5.4, hence we can't simply rename to fix the issue.

Also, since the incorrect config is logged in offset log in streaming query, the fix isn't just easy like adding withAlternative and done. We need to manually handle the case where offset log contains the incorrect config, and set the value of incorrect config in the offset log into the new config. Once a single microbatch has planned after the restart (hence the above logic is applied), offset log will contain the "new" config and it will no longer refer to the incorrect config.

That said, we can remove the incorrect config in the Spark version which we are confident that there will be no case users will upgrade from Spark 3.5.4 to that version.

Why are the changes needed?

We released an incorrect config and we want to rename it properly. While renaming, we don't also want to have any breakage on the existing streaming query.

Does this PR introduce any user-facing change?

No. That is what this PR is aiming for.

How was this patch tested?

New UT.

Was this patch authored or co-authored using generative AI tooling?

No.

@@ -4115,6 +4115,7 @@ object SQLConf {
.doc("Allow PruneFilters to remove streaming subplans when we encounter a false filter. " +
"This flag is to restore prior buggy behavior for broken pipelines.")
.version("4.0.0")
.withAlternative("spark.databricks.sql.optimizer.pruneFiltersCanPruneStreamingSubplan")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dongjoon-hyun
Should we add this to DeprecatedConfig as well? Also, should we also file a JIRA ticket for removing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's no harm to add it. When we remove the config in the future (throw an error if it's set), we need a ticket.

// metadata in the offset log may have this if the batch ran from Spark 3.5.4.
// We need to pick the value from the metadata and set it in the new config.
// This also leads the further batches to have a correct config in the offset log.
metadata.conf.get("spark.databricks.sql.optimizer.pruneFiltersCanPruneStreamingSubplan") match {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add TODO with mentioning removal when we want to file JIRA ticket for removal immediately.

"value of the fixed config")

val offsetLog = new OffsetSeqLog(spark, new File(dir, "offsets").getCanonicalPath)
def checkConfigFromMetadata(batchId: Long, expectCorrectConfig: Boolean): Unit = {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is a bit hard to understand from non-streaming expert folks. Please let me know if you need some further explanation to justify the logic.

@HeartSaVioR
Copy link
Contributor Author

I'm going to submit PRs for 4.0/3.5 as well.

@HeartSaVioR
Copy link
Contributor Author

cc. @dongjoon-hyun @HyukjinKwon Please take a look. Thanks!
cc. @cloud-fan for visibility of the fix for blocker issue

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for late response.

I was busy for internal preparing of Spark 4.0.0.

For this one, shall we do this at Apache Spark 3.5.5 while keeping Apache Spark 4.0.0 from from spark.databricks.*?

If you are busy, I can volunteer the release manager for Apache Spark 3.5.5.

When we release Apache Spark 3.5.5 this month, Apache Spark 4.0.0 can be free of spark.databricks.*.

WDYT, @HeartSaVioR and @cloud-fan ?

@dongjoon-hyun
Copy link
Member

I sent an email for further discussion

@HeartSaVioR
Copy link
Contributor Author

HeartSaVioR commented Feb 17, 2025

@dongjoon-hyun

Let me clarify a bit.

  1. I have claimed that the config is not something user (even admin) would understand what it is and try to flip. That said, removing this config does not matter to me at all and I'm +1 to remove in Spark 4.0.0 (with argument that we can remove this in 3.5.5 as well).
  2. The only issue is for users who ever ran their query in Spark 3.5.4, because the incorrect config is put to offset log and we shouldn't ignore this. I've added the logic to migrate to the incorrect config to new config when we read from offset log (and offset log for further microbatches will follow the new config). This logic should remain for multiple minor releases, so while we can discontinue supporting the incorrect config, we can't remove the incorrect config "key" from the codebase for multiple minor releases.

Please let me know if you are not on the same page in above, or have any question for this. Thanks!

If we are on the same page, I'll remove the config in master/4.0 PR and deprecate the config in 3.5.

@HeartSaVioR
Copy link
Contributor Author

W.r.t. release manager for Spark 3.5.5, either is fine for me. I'm happy to take the step if you prefer to let me take it.

@cloud-fan
Copy link
Contributor

If we have this grace handling in 3.5.5, we should have it in 4.0.0 as well, otherwise it's a breaking changing in 4.0.

@HeartSaVioR
Copy link
Contributor Author

HeartSaVioR commented Feb 18, 2025

The migration logic can't be removed anytime soon. I'd say we could only remove it in 4.2 or so (conservatively speaking, 5.0). My all three PRs contain the migration logic.

I guess the main point here is when we can stop allowing users to specify the incorrect config by themselves. I have been arguing that users would never set this manually, but I'm open to being conservative on this as well.

@HeartSaVioR
Copy link
Contributor Author

So according to recent discussion, I'm going to remove the withAlternative in this (4.1) and 4.0 PR, but should still need to persist the migration logic for a while. I hope we are on the same page on the necessity of migration logic for a while.
@dongjoon-hyun @cloud-fan

@dongjoon-hyun
Copy link
Member

Technically, we didn't make an agreement on Apache Spark 4.0.0 behavior because there are two alternatives still.

  1. One way is this approach which I also agree that technically correct, @HeartSaVioR .
  2. The second one is to add a migration document simply for the users to ask to upgrade from Spark 3.5.x line to 3.5.5 or the above first before Apache Spark 4.0.0.

If you don't mind, I'd like to propose to wait until we finish Apache Spark 3.5.5 release.

@HeartSaVioR
Copy link
Contributor Author

HeartSaVioR commented Feb 23, 2025

I'm really not sure 2 is a realistic alternative. We shouldn't assume users to be very close to hear Spark community news. My experience so far is telling me that users even don't read the doc.

Another point for me is that users do not really care about whether the config contains the vendor name or not. This is really a concern from ASF hat on (for sure, I agree this is problematic with one of PMC members in ASF project hat on), but I'm sure it's not a huge concern from users worth upgrading the version. So if we tell users to upgrade the Spark version just because of this problematic config, I'm very sure that they won't upgrade. (We shouldn't also lie to users that there is security concern or so.) We shouldn't pretend that it will be successful if we enforce users to upgrade to specific version.

@cloud-fan
Copy link
Contributor

have we merged this graceful deprecation in branch 3.5?

@HeartSaVioR
Copy link
Contributor Author

HeartSaVioR commented Feb 25, 2025

@cloud-fan

have we merged this graceful deprecation in branch 3.5?

Yes, that is merged. It's still a blocker for Spark 4.0.0 though.

@dongjoon-hyun

If you don't mind, I'd like to propose to wait until we finish Apache Spark 3.5.5 release.

Would you mind explaining why this has to be coupled with Spark 3.5.5, as discussion for 4.1/4.0 can always happen in parallel?

I sincerely disagree that the option 2 is viable for us. This builds a huge limitation of SS users for Spark 3.5.4 on upgrading (they have to touch Spark 3.5.x before upgrading to 4.0.0), to achieve our goal of removing the "string" of incorrect config immediately. Again, this is just a string because it's only for migration logic and users won't be able to set the config manually. Why not just be more open for Spark 3.5.4 users to have a range of upgrade path? I'll ensure I remove the migration logic and finally remove the string of the config in certain Spark version, which I prefer to have it in 4.1 or even longer term.

If you'd like to hear more voices, I'm happy to post the discussion to dev@. I just don't want this change to be coupled with non-relevant release.

@dongjoon-hyun
Copy link
Member

Yes, please. Thank you, @HeartSaVioR .

If you'd like to hear more voices, I'm happy to post the discussion to dev@

@HeartSaVioR
Copy link
Contributor Author

Thanks, posted to dev@. - https://lists.apache.org/thread/xzk9729lsmo397crdtk14f74g8cyv4sr

@dongjoon-hyun dongjoon-hyun dismissed their stale review March 13, 2025 01:09

According to the community vote, this becomes stale.

Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Jun 22, 2025
@github-actions github-actions bot closed this Jun 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants