Skip to content

Conversation

@the-other-tim-brown
Copy link
Contributor

@the-other-tim-brown the-other-tim-brown commented Nov 21, 2025

Describe the issue this Pull Request addresses

Addresses #14269
The AvroInternalSchemaConverter currently directly relies on an Avro schema. This PR switches the class to operate on HoodieSchema as part of the migration to Hoodie's own type system.

Summary and Changelog

  • AvroInternalSchemaConverter is renamed to InternalSchemaConverter
  • All methods in InternalSchemaConverter now take HoodieSchema instead of an avro schema
  • Callers of these methods convert their avro schema to HoodieSchema with the fromAvroSchema method
  • Related test cases are update to directly construct HoodieSchema directly when possible

Impact

This is laying the groundwork for moving the codebase over to Hudi's own type system.

Risk Level

Low, this maintains parity with the existing code

Documentation Update

Contributor's checklist

  • Read through contributor's guide
  • Enough context is provided in the sections above
  • Adequate tests were added if applicable

@github-actions github-actions bot added the size:L PR with lines of changes in (300, 1000] label Nov 21, 2025
* @param type the non-null type
* @return new HoodieSchema representing a nullable union
*/
public static HoodieSchema createNullableSchema(HoodieSchema type) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is another method createNullable that covers similar functionality so I am deduping

*/
public Option<String> getFullName() {
return Option.ofNullable(avroSchema.getFullName());
public String getFullName() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name is always meant to be non-empty in avro and I think we should follow a similar pattern here or we end up with a lot of option handling code in the callers

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a precondition in the constructor of HoodieSchema to verify this assumption

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +21 to +24
import org.apache.hudi.common.schema.HoodieJsonProperties;
import org.apache.hudi.common.schema.HoodieSchema;
import org.apache.hudi.common.schema.HoodieSchemaField;
import org.apache.hudi.common.schema.HoodieSchemaType;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I start to think that it's better to remove the Hoodie prefix from these class names in org.apache.hudi.common.schema packages, so it's easier to read, as eventually we are going to only maintain one schema system. Wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is implied by the package but I think we should do that after we have removed the avro usage. Otherwise the difference is more subtle and likely to be missed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

Comment on lines 382 to 385
return getTypes().stream()
.filter(schema -> schema.getType() != HoodieSchemaType.NULL)
.findFirst()
.orElseThrow(() -> new IllegalArgumentException("No non-null type found in Union"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use HoodieSchema#getNonNullType instead? Or is this only intended to get the type, so missing an element of a union is fine?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll switch to getNonNullType. I forgot to update this branch after the other one was merged.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it

@yihua yihua added this to the release-1.2.0 milestone Nov 22, 2025
@the-other-tim-brown the-other-tim-brown force-pushed the 14269-internal-schema-integration branch from 85f648a to 5b63f28 Compare November 23, 2025 23:20
@github-actions github-actions bot added size:XL PR with lines of changes > 1000 and removed size:L PR with lines of changes in (300, 1000] labels Nov 24, 2025
@the-other-tim-brown the-other-tim-brown force-pushed the 14269-internal-schema-integration branch from 12cfee8 to 970cc42 Compare November 24, 2025 21:33
@the-other-tim-brown the-other-tim-brown force-pushed the 14269-internal-schema-integration branch from 2159f34 to df96269 Compare November 25, 2025 13:02
@the-other-tim-brown
Copy link
Contributor Author

@yihua and @bvaradar is there any other feedback for this PR?

@hudi-bot
Copy link

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

Copy link
Contributor

@yihua yihua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines +21 to +24
import org.apache.hudi.common.schema.HoodieJsonProperties;
import org.apache.hudi.common.schema.HoodieSchema;
import org.apache.hudi.common.schema.HoodieSchemaField;
import org.apache.hudi.common.schema.HoodieSchemaType;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

Comment on lines 382 to 385
return getTypes().stream()
.filter(schema -> schema.getType() != HoodieSchemaType.NULL)
.findFirst()
.orElseThrow(() -> new IllegalArgumentException("No non-null type found in Union"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it

@yihua yihua merged commit a06d409 into apache:master Nov 25, 2025
135 of 137 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:XL PR with lines of changes > 1000

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants