Skip to content

Conversation

rayokota
Copy link
Contributor

@rayokota rayokota commented Aug 31, 2025

What is the purpose of the change

Currently the C++ Avro library does not have a way to parse a new schema that refers to named schemas that have been previously parsed.

The equivalent in the Java Avro library is to construct an instance of Schema.Parser and to use the same parser instance to parse the referenced schemas before the referring schema.

This PR adds a method called compileJsonSchemaWithNamedReferences.

Without this change, tools that allow Avro schemas to reference one another, such as the Confluent Schema Registry, cannot interoperate well with C++ applications.

An equivalent issue was fixed for C#: https://issues.apache.org/jira/browse/AVRO-4091

Verifying this change

Added test that uses the new API compileJsonSchemaWithNamedReferences.

Documentation

  • Does this pull request introduce a new feature? no

@github-actions github-actions bot added the C++ Pull Requests for C++ binding label Aug 31, 2025
@@ -94,7 +94,15 @@ static NodePtr makeNode(const string &t, SymbolTable &st, const string &ns) {

auto it = st.find(n);
if (it != st.end()) {
return NodePtr(new NodeSymbolic(asSingleAttribute(n), it->second));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this could lead to memory leak in case of circular schemas. In the previous code, if you attempt to make a node that is already present, it returns a symbolic reference to the old one. If we return the pointer to the old one, this could lead to cycles of smart pointers which will never get deleted.
Can we add a test that proves that the above situation doesn't arise? One way to test is to hold a weak_ptr to the returned object and ensure that the pointer holds nothing after the outermost schema is gone.
An example of circular schema is a binary tree, where each node has two children which are union of the node's schema and null.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reviewing the PR @thiru-mg ! I added the test that you suggested. Here are some observations:

  • Returning the existing node directly in makeNode does not create shared_ptr cycles for recursive schemas. During validation, duplicates are replaced with NodeSymbolic via setLeafToSymbolic, breaking potential strong cycles.

  • The new test constructs a record Node with left/right fields as ["null","Node"], captures a weak_ptr to the root, lets ValidSchema go out of scope, and asserts the weak_ptr expired. It passes, demonstrating no leak from cycles.

@martin-g
Copy link
Member

martin-g commented Sep 4, 2025

@wgtmac Do you want to review this ?

@@ -58,6 +60,9 @@ AVRO_DECL ValidSchema compileJsonSchemaFromString(const std::string &input);

AVRO_DECL ValidSchema compileJsonSchemaFromFile(const char *filename);

AVRO_DECL ValidSchema compileJsonSchemaWithNamedReferences(std::istream &is,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious about the preference below for namedReferences:

  • std::map vs std::unordered_map
  • avro::Name vs std::string
  • ValidSchema vs avro::NodePtr

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typically the reference will be obtained by a method such as compileJsonSchemaFromString, which returns an instance of ValidSchema (as shown in CompilerTests.cc).

avro::Name seems preferable since it prevents invalid namespaces and names.

std::map is used for consistency with the underlying SymbolTable, which is also std::map. However, I don't have a strong preference in this case.

@rayokota
Copy link
Contributor Author

rayokota commented Sep 8, 2025

Thanks for the reviews @thiru-mg and @wgtmac ! I've incorporated all your feedback. Please let me know if you have any further questions. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C++ Pull Requests for C++ binding
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants