Skip to content

Schema analysis and shape extraction #2

@jmillanacosta

Description

@jmillanacosta

Identify which schema extractor to use at the start of the workflow.

Identified strengths and limitations:

VoID generator RDF-config sheXer
Overview of the tool Extracts statistics from an RDF endpoint or a file Automates SPARQL and schema diagram generation Extracts ShEx and SHACL structure from an RDF graph
Documentation Incomplete Present Present
Minimal requirement Requires an RDF file or SPARQL endpoint; Graph must have triples with rdf:type predicates Requires manually curated model.yaml and prefix.yaml files Requires a Turtle file or a SPARQL endpoint
Data model representation Object class-based file detailing all classes and properties A set of tree structures representing classes and properties in YAML which can be exported as SVG A SHACL or ShEx schema, optionally a UML diagram in PNG representing the shape graph
Process Automated Semi-automated Automated, and optionally one can change the threshold used to accept patterns in the graph as shapes and I/O settings
Interpretability Difficult; requires programming knowledge to understand organization of classes and properties Easy; human-readable terms and graphical representation make the graph structure easily understandable Easy; graphical representation facilitates quick interpretation
Readability Human-readable terms (mapping ontology to labels); more compatible with programming language formats Human-readable terms assigned for classes and properties The diagrams and graphs are rendered with their URIs and no labels
Error reporting Through Git issues Through Git issues Through Git issues
Implementation Java (Native binary) Ruby Python
Limitation Quadratic runtime for generating files (e.g., IDSM, OrthoDB); Not applicable for shape subclasses (e.g., In Rhea, where compounds are sub-classified into products, reactants, etc.) Requires manual curation of input (Potential solution: integrate with VoID generators for semi-automation, for which a prototype exists) The current algorithm is unable to detect specific patterns within the data; however, the developers are aware of this limitation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    help wantedExtra attention is needed

    Type

    Projects

    Status

    In Progress

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions