-
Notifications
You must be signed in to change notification settings - Fork 1.7k
AVRO 4175: [C++] Allow previously parsed schemas to be referenced when parsing a schema #3475
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
c5069ba
to
8c2fd81
Compare
@@ -94,7 +94,15 @@ static NodePtr makeNode(const string &t, SymbolTable &st, const string &ns) { | |||
|
|||
auto it = st.find(n); | |||
if (it != st.end()) { | |||
return NodePtr(new NodeSymbolic(asSingleAttribute(n), it->second)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this could lead to memory leak in case of circular schemas. In the previous code, if you attempt to make a node that is already present, it returns a symbolic reference to the old one. If we return the pointer to the old one, this could lead to cycles of smart pointers which will never get deleted.
Can we add a test that proves that the above situation doesn't arise? One way to test is to hold a weak_ptr
to the returned object and ensure that the pointer holds nothing after the outermost schema is gone.
An example of circular schema is a binary tree, where each node has two children which are union of the node's schema and null
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reviewing the PR @thiru-mg ! I added the test that you suggested. Here are some observations:
-
Returning the existing node directly in
makeNode
does not createshared_ptr
cycles for recursive schemas. During validation, duplicates are replaced withNodeSymbolic
viasetLeafToSymbolic
, breaking potential strong cycles. -
The new test constructs a record
Node
with left/right fields as["null","Node"]
, captures aweak_ptr
to the root, letsValidSchema
go out of scope, and asserts theweak_ptr
expired. It passes, demonstrating no leak from cycles.
@wgtmac Do you want to review this ? |
@@ -58,6 +60,9 @@ AVRO_DECL ValidSchema compileJsonSchemaFromString(const std::string &input); | |||
|
|||
AVRO_DECL ValidSchema compileJsonSchemaFromFile(const char *filename); | |||
|
|||
AVRO_DECL ValidSchema compileJsonSchemaWithNamedReferences(std::istream &is, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious about the preference below for namedReferences:
std::map
vsstd::unordered_map
avro::Name
vsstd::string
ValidSchema
vsavro::NodePtr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typically the reference will be obtained by a method such as compileJsonSchemaFromString
, which returns an instance of ValidSchema
(as shown in CompilerTests.cc
).
avro::Name
seems preferable since it prevents invalid namespaces and names.
std::map
is used for consistency with the underlying SymbolTable
, which is also std::map
. However, I don't have a strong preference in this case.
What is the purpose of the change
Currently the C++ Avro library does not have a way to parse a new schema that refers to named schemas that have been previously parsed.
The equivalent in the Java Avro library is to construct an instance of Schema.Parser and to use the same parser instance to parse the referenced schemas before the referring schema.
This PR adds a method called
compileJsonSchemaWithNamedReferences
.Without this change, tools that allow Avro schemas to reference one another, such as the Confluent Schema Registry, cannot interoperate well with C++ applications.
An equivalent issue was fixed for C#: https://issues.apache.org/jira/browse/AVRO-4091
Verifying this change
Added test that uses the new API
compileJsonSchemaWithNamedReferences
.Documentation