-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-51187][SQL][SS] Implement the graceful deprecation of incorrect config introduced in SPARK-49699 #49983
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
0e1cfdd
Revert "[SPARK-51172][SS] Rename to spark.sql.optimizer.pruneFiltersC…
HeartSaVioR 22cb5bb
Fix to migrate the incorrect config we introduced in SPARK-49699 to t…
HeartSaVioR 37c52c0
misc fix
HeartSaVioR 6661426
fix offsetlog as they won't be a result of Spark 4.0.0 after this fix
HeartSaVioR 8e5dde4
remove FIXME
HeartSaVioR 077fc41
revert unnecessary fix
HeartSaVioR bcce8ac
remove alternative config
HeartSaVioR 0e03656
also adding this to the migration guide for removal of config
HeartSaVioR File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2 changes: 2 additions & 0 deletions
2
sql/core/src/test/resources/structured-streaming/checkpoint-version-3.5.4/commits/0
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
v1 | ||
{"nextBatchWatermarkMs":0} |
2 changes: 2 additions & 0 deletions
2
sql/core/src/test/resources/structured-streaming/checkpoint-version-3.5.4/commits/1
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
v1 | ||
{"nextBatchWatermarkMs":0} |
1 change: 1 addition & 0 deletions
1
sql/core/src/test/resources/structured-streaming/checkpoint-version-3.5.4/metadata
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
{"id":"3f409b2c-b22b-49f6-b6e4-86c2bdcddaba"} |
3 changes: 3 additions & 0 deletions
3
sql/core/src/test/resources/structured-streaming/checkpoint-version-3.5.4/offsets/0
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
v1 | ||
{"batchWatermarkMs":0,"batchTimestampMs":1739419905155,"conf":{"spark.sql.streaming.stateStore.providerClass":"org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider","spark.sql.streaming.stateStore.rocksdb.formatVersion":"5","spark.sql.streaming.statefulOperator.useStrictDistribution":"true","spark.sql.streaming.flatMapGroupsWithState.stateFormatVersion":"2","spark.sql.streaming.aggregation.stateFormatVersion":"2","spark.sql.shuffle.partitions":"5","spark.sql.streaming.join.stateFormatVersion":"2","spark.sql.streaming.stateStore.compression.codec":"lz4","spark.sql.streaming.multipleWatermarkPolicy":"min","spark.databricks.sql.optimizer.pruneFiltersCanPruneStreamingSubplan":"false"}} | ||
0 |
3 changes: 3 additions & 0 deletions
3
sql/core/src/test/resources/structured-streaming/checkpoint-version-3.5.4/offsets/1
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
v1 | ||
{"batchWatermarkMs":0,"batchTimestampMs":1739419906627,"conf":{"spark.sql.streaming.stateStore.providerClass":"org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider","spark.sql.streaming.stateStore.rocksdb.formatVersion":"5","spark.sql.streaming.statefulOperator.useStrictDistribution":"true","spark.sql.streaming.flatMapGroupsWithState.stateFormatVersion":"2","spark.sql.streaming.aggregation.stateFormatVersion":"2","spark.sql.shuffle.partitions":"5","spark.sql.streaming.join.stateFormatVersion":"2","spark.sql.streaming.stateStore.compression.codec":"lz4","spark.sql.streaming.multipleWatermarkPolicy":"min","spark.databricks.sql.optimizer.pruneFiltersCanPruneStreamingSubplan":"false"}} | ||
1 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1471,6 +1471,75 @@ class StreamingQuerySuite extends StreamTest with BeforeAndAfter with Logging wi | |
) | ||
} | ||
|
||
test("SPARK-51187 validate that the incorrect config introduced in SPARK-49699 still takes " + | ||
"effect when restarting from Spark 3.5.4") { | ||
// Spark 3.5.4 is the only release we accidentally introduced the incorrect config. | ||
// We just need to confirm that current Spark version will apply the fix of SPARK-49699 when | ||
// the streaming query started from Spark 3.5.4. We should consistently apply the fix, instead | ||
// of "on and off", because that may expose more possibility to break. | ||
|
||
val problematicConfName = "spark.databricks.sql.optimizer.pruneFiltersCanPruneStreamingSubplan" | ||
|
||
withTempDir { dir => | ||
val input = getClass.getResource("/structured-streaming/checkpoint-version-3.5.4") | ||
assert(input != null, "cannot find test resource") | ||
val inputDir = new File(input.toURI) | ||
|
||
// Copy test files to tempDir so that we won't modify the original data. | ||
FileUtils.copyDirectory(inputDir, dir) | ||
|
||
// Below is the code we extract checkpoint from Spark 3.5.4. We need to make sure the offset | ||
// advancement continues from the last run. | ||
val inputData = MemoryStream[Int] | ||
val df = inputData.toDF() | ||
|
||
inputData.addData(1, 2, 3, 4) | ||
inputData.addData(5, 6, 7, 8) | ||
|
||
testStream(df)( | ||
StartStream(checkpointLocation = dir.getCanonicalPath), | ||
AddData(inputData, 9, 10, 11, 12), | ||
ProcessAllAvailable(), | ||
AssertOnQuery { q => | ||
val confValue = q.lastExecution.sparkSession.conf.get( | ||
SQLConf.PRUNE_FILTERS_CAN_PRUNE_STREAMING_SUBPLAN) | ||
assert(confValue === false, | ||
"The value for the incorrect config in offset metadata should be respected as the " + | ||
"value of the fixed config") | ||
|
||
val offsetLog = new OffsetSeqLog(spark, new File(dir, "offsets").getCanonicalPath) | ||
def checkConfigFromMetadata(batchId: Long, expectCorrectConfig: Boolean): Unit = { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I know this is a bit hard to understand from non-streaming expert folks. Please let me know if you need some further explanation to justify the logic. |
||
val offsetLogForBatch = offsetLog.get(batchId).get | ||
val confInMetadata = offsetLogForBatch.metadata.get.conf | ||
if (expectCorrectConfig) { | ||
assert(confInMetadata.get(SQLConf.PRUNE_FILTERS_CAN_PRUNE_STREAMING_SUBPLAN.key) === | ||
Some("false"), | ||
"The new offset log should have the fixed config instead of the incorrect one." | ||
) | ||
assert(!confInMetadata.contains(problematicConfName), | ||
"The new offset log should not have the incorrect config.") | ||
} else { | ||
assert( | ||
confInMetadata.get(problematicConfName) === Some("false"), | ||
"The offset log in test resource should have the incorrect config to test properly." | ||
) | ||
assert( | ||
!confInMetadata.contains(SQLConf.PRUNE_FILTERS_CAN_PRUNE_STREAMING_SUBPLAN.key), | ||
"The offset log in test resource should not have the fixed config." | ||
) | ||
} | ||
} | ||
|
||
assert(offsetLog.getLatestBatchId() === Some(2)) | ||
checkConfigFromMetadata(0, expectCorrectConfig = false) | ||
checkConfigFromMetadata(1, expectCorrectConfig = false) | ||
checkConfigFromMetadata(2, expectCorrectConfig = true) | ||
true | ||
} | ||
) | ||
} | ||
} | ||
|
||
private def checkAppendOutputModeException(df: DataFrame): Unit = { | ||
withTempDir { outputDir => | ||
withTempDir { checkpointDir => | ||
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add TODO with mentioning removal when we want to file JIRA ticket for removal immediately.