-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-54403][SQL][Metric View] Add YAML serde infrastructure for metric views #53146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
… for metric views This commit adds the complete serialization/deserialization infrastructure for parsing metric view YAML definitions: - Add Jackson YAML dependencies to pom.xml - Implement canonical model for metric views: - Column, Expression (Dimension/Measure), MetricView, Source - YAMLVersion validation and exception types - Implement version-specific serde (v0.1): - YAML deserializer/serializer - Base classes for extensibility - Add JSON utilities for metadata serialization
pom.xml
Outdated
| <dependency> | ||
| <groupId>com.fasterxml.jackson.dataformat</groupId> | ||
| <artifactId>jackson-dataformat-yaml</artifactId> | ||
| <version>${fasterxml.jackson.version}</version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all jackson deps are managed by jackson-bom now, you don't need to declare it again here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
|
||
| object JsonUtils { | ||
| // Singleton ObjectMapper that can be used across the project | ||
| private lazy val mapper: ObjectMapper = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the comments, should we try to use this singleton as much as possible in the subsequent development?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'm surprised that there is no such util in the Spark repo. For now, I plan to use it for all the metric view development, but not sure how much other code needs this.
| // Only parse the "version" field and ignore all others | ||
| @JsonIgnoreProperties(ignoreUnknown = true) | ||
| case class YAMLVersion(version: String) extends Validatable { | ||
| def validYAMLVersions: Set[String] = Set("0.1") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| def validYAMLVersions: Set[String] = Set("0.1") | |
| private def validYAMLVersions: Set[String] = Set("0.1") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
|
||
| val mapper = new ObjectMapper(yamlFactory) | ||
| // Exclude null values from serialized output | ||
| .setSerializationInclusion(JsonInclude.Include.NON_NULL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| .setSerializationInclusion(JsonInclude.Include.NON_NULL) | |
| . setDefaultPropertyInclusion(JsonInclude.Include.NON_NULL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| // Exclude null values from serialized output | ||
| .setSerializationInclusion(JsonInclude.Include.NON_NULL) | ||
| // Exclude empty collections/strings from serialized output | ||
| .setSerializationInclusion(JsonInclude.Include.NON_EMPTY) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| .setSerializationInclusion(JsonInclude.Include.NON_EMPTY) | |
| . setDefaultPropertyInclusion(JsonInclude.Include.NON_EMPTY) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| ) extends ColumnBase | ||
|
|
||
| object ColumnV01 { | ||
| def fromCanonical(canonical: Column[_ <: Expression], ordinal: Int): ColumnV01 = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is ordinal mandatory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, removed
| Try(CatalystSqlParser.parseQuery(sourceText)) match { | ||
| case Success(_) => SQLSource(sourceText) | ||
| case Failure(queryEx) => | ||
| throw queryEx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should queryEx be wrapped into MetricViewValidationException?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
cc @cloud-fan to review |
What changes were proposed in this pull request?
This PR adds the complete serialization/deserialization infrastructure for parsing metric view YAML definitions:
Why are the changes needed?
SPIP: Metrics & semantic modeling in Spark
Does this PR introduce any user-facing change?
No
How was this patch tested?
Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code
Co-Authored-By: Claude [email protected]