-
Notifications
You must be signed in to change notification settings - Fork 1
New work item: crate r2c2_statement
#6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! It's definitely the most important goal of our CG but sadly likely one of the trickiest to get right. We need to find a compromise between ease of use and versatility and I fear it won't be easy.
term/src/lib.rs
Outdated
//! 1. define or import simple wrapper types for building blocks | ||
//! (IRIs, language tags...) | ||
//! 2. define traits for different kinds of terms | ||
//! (Subject, Predicate, Object, GraphName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is imho going a bit too much into the "how" direction. It does not sound obvious that these should be traits and not enums.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll argue in favor of traits here:
What I aim is to avoid as much data transformation as possible when communicating between two implementations. That's why I try to favor lighweight wrapper types, and traits.
Imagine I want to consume some triples produced by oxttl
to canonicalize them with sohpia_c14n
. (I'll focus on subjects but of course the same would apply to predicates and objects). If Subject
was an enum, I would have to transform the subjects produced by oxttl
into that enum. And then sophia_c14n
would have to transform this enum again into its own internal representation.
If OTOH Subject
is a trait, which the types of oxttl
implement, and which sophia_c14n
accepts as input, then the data produced by oxttl
can be passed directly to sophia_c14n
, which then will transform it directly into its own internal representation. That's one transformation less.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the other side having an enum makes manipulation easier. I tend to think this is a compromise to be done when we know more about how we represent IRIs/blank nodes/... and should not be set in stone at the beginning of this work item.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to defer this discussion, the goal was not to set anything in stone. I've just pushed a commit to clarify that the proposed design was just an example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! Perfect!
Agreed. I tried to not be too specific in the PR, but on the other hand, keeping things too abstract make them without substance. I don't think it would make sense to agree an a very abstract work-item if we don't have some agreement on what it will contain. But of course, we don't need to figure out all the details up-front. |
Yes! What about something in the line of "It would provide types to encode and manipulate RDF concepts like IRI, blank node, literal, term and triple", making the scope clear while leaving the
I would tend to prefer |
Re. terminology:
|
Add comment to clarify that the proposed design can be challenged.
thinking a little more about this... |
I've been giving this more thoughts, and I believe that there is a way to have the best of both words (traits and enums). More specifically, a That method enum providing everything there is to know about the subject (resp. predicate, object), any other method that the traits may provide could have a default impl based on the result of |
r2c2_term
r2c2_statement
in addition to defining the core traits and types, it proposes 2 proof-of-concept implementations, for oxrdf and rdf_types, (behind the feature gate 'poc_impl') and demonstrate interoperability between the two by testing roundtripping of both implementation via the other
Thank you so much for pushing this. Some major pain points I have with it as a starting point:
|
Absolutely agreed. The I'm happy to comment out the
I hear your point, but I still have mixed feelings about this...
Here we go :) If IRIs in R2C2 (the same reasoning applies to language tags) did not provide any guarantee of validity, it would mean that
This is not ideal, in particular because many producers will actually produce valid IRIs (hopefully!), but consumers will still need to check them every time. With the proposed design:
As you can see, the sweet spot is for implementations following Postel's law: conservative producers and lenient consumers. The burden of additional checks is taken only by the implementations who depart from Postel's law. For this to work, we need to ensure that the guarantees provided by R2C2 correspond to the MUSTs in the spec, nothing more, nothing less. That's why, for example, blank node labels are not constrained (while several implementations, including mine, constrain them to be valid SPARQL bnode labels). |
I just had a long discussion about this with @labra and an idea came up to hide all validation code behind a feature gate:
In my story above, strict producer would probably use the crate without the feature (they only need the @Tpt would that be an acceptable middle-ground for you? |
@pchampin Sorry for the answer delay. This sounds much better. However, I am still a bit scared that mixing the two features in the same crate will make versioning more painful: it is likely we will want the traits crates to be as stable as possible whereas it is more fine for implementations to see breaking changes |
that's a very valid point, thanks.
This way, we could keep the versioning of the API independant from the versioning of the validation code. |
* statement contains only traits and simple wrapper types * statement_validation contains code for validating the contract of the wrapper types
@Tpt for clarity, I updated my proposed code with the split between |
@pchampin you asked for some feedback: I think you could replace traits like This would also free up the name This would result in an idiomatic and intuitive API: let literal = object.try_into()?; fn foo(in: impl Into) {}; |
@GordianDziwis thanks a lot for this feedback. That's an interesting perspective, I have to think about it more. As I wrote at the top of this thread: the first step is to first adopt the work item -- then we can discuss design choices (with dedicated issues to discuss them). I'm taking your comment as a support for the work item, right? |
@pchampin Sorry for the noise. Concerning the actual work item of having two crates:
Isn't this covered by semantic versioning? While the validation code could change faster, the API should be very stable, so dependents wouldn't have to do anything if a change in the validation code causes a minor API bump.
Why would I want to implement my own validation code? Also, I think this is problematic: r2c2/statement_validation/src/_iri.rs Line 7 in 11692f9
If there is a trait An |
I'm responding to your questions in a different order, hopefully to makes things clearer.
You need to remember that this group does not aim to provide "yet another RDF implementation in Rust", but to provide a thin interoperability layer between existing implementations (such as Oxigraph, rdf-types, Rudof or Sophia). Each of these implementation already has its own code to validate IRIs, language tags, etc. Now, if you wanted to make your own RDF implementation, and make it compliant with R2C2 from the start, then of course, you might want to reuse the validation code provided by R2C2. But our goal is not to erase diversity of implementations, merely to help interoperability between them.
Semantic versioning dictates that any breaking change bumps the major version number. So if we make breaking change in the validation code, this will create a new major version, even if the "pure API" (i.e. the traits and types currently in the
granted, that name is confusing and should be changed. But again, I'd rather table this kind of discussion once we accept the work item. |
I follow your reasoning that follows from the goal of diversity, and I am absolutely on board for a diversity of implementations for concepts like a graph or a dataset. But do we want diversity in IRI, Subject, Triple etc. implementations? My instinct would be to get rid of multiple implementations of the same validation logic. First because of duplicated effort, second because of better interoperability, the same implementation is guaranteed to be interoperable.
My assumption was that breaking changes to the API of the validation code are infrequent, because it is very well-defined what a valid x is it won't change what a valid x is, and the API is basically a single function for each x. Please feel free to ignore my comment in favor of GSD. I want just to give input, not block progress with my opinions. So, thumbs up to the proposal. |
mandatory XKCD reference: https://xkcd.com/927/ 😈 More seriously:
Fair enough. One might argue, still, that "it is very well-defined what a valid x is" is overly optimistic, because standards are sometimes interpreted differently by different people, and because some of those standards are moving targets (see the history of BCP47, for example).
Great, thank you 👍 |
Thank you for all the work! Given that RickView is based on Sophia, I can't use those traits directly until they get adopted by Sophia, the only candidate where I can test it is HDT, I will try to find the time to check if it is possible to refactor it using those traits in an experimental branch to find problems adopting them. However first I hope it's OK if I state my general preferences, past experiences with Rust RDF APIs (mostly Sophia) and requirements:
Similar to @GordianDziwis I do not prefer diversity of implementations at all, my preferred state would be to just have one that everybody uses. So I would rather have an opinionated compromise that every implementation can achieve than support everything that already exists. |
As for the String type Cow<'a, str> used for the IRI and literals (and I guess by extension then for triples), I'm generally a fan of immutable variables so for example in a Java implementation I would not want a method I call with my IRI changing that IRI without me knowing. However I have used your MownStr in the past, which is described as more efficient but I guess this would not work on a high level interoperable trait level? Also I have problems finding the right String type when querying RDF.
The current state has Iri defined as |
@KonradHoeffner thanks for this feedback :) The discussion about the general goal of the crate, however, is more fundamental. The goal is deliberately to provide a thin (or as thin as possible) interface between different implementations, not a new implementation to replace them. In other words, the goal is more to embrace diversity than to try to eliminate it. Note that this work item does not prevent the group to also propose a "reference implementation" aiming to replace the others. But his is not the goal that this work item aims to achieve. |
Ah, sorry if I'm misunderstanding, feel free to go ahead with this PR if this is out of scope. I'm in favor of the idea to define traits for RDF terms and triples (quads I don't need but maybe others do). |
The idea of this crate is to be the first component of the "common API".
It would focus on RDF terms, triples and quads, and would provide
Subject
,Predicate
,Object
,GraphName
)MaybeIri
,MaybeLiteral
...)Also, since triple terms will force use to define a notion of Triple, it might make sense to also define Quad in this crate, although this stretches the scope of the crate a little bit. Should we name it instead
r2c2_term_statement
, which is more accurate, but a little verbose...edited the current proposal contains some code, mostly to illustrate the intended content proposed Work Item -- please do not focus on the code itself, but on the general spirit.
Also following discussions with @Tpt, the proposal is now to have two complementary crates