Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
0068c37
add bring_remaining to process in get_rest_mapping
syphax-bouazzouni Sep 23, 2022
117c31b
split mappings_ontologies method to small functions
syphax-bouazzouni Sep 23, 2022
29862df
fix rest mappings tests
syphax-bouazzouni Dec 17, 2022
363fcb5
Merge branch 'pr/fix/get-rest-mapping-with-prefix-url' into development
syphax-bouazzouni Dec 17, 2022
3d502b7
remove SAME_URI filter exception
syphax-bouazzouni Dec 17, 2022
f403571
fix miss typing variable
syphax-bouazzouni Dec 17, 2022
ad092d5
Merge branch 'upstream' into pr/refactor/simple-refactor-mapping-onto…
syphax-bouazzouni Dec 17, 2022
07eccad
add internal_mapping_predicates
syphax-bouazzouni Dec 17, 2022
a67f638
add internal_mapping_predicates to mappings_ont_build_query
syphax-bouazzouni Dec 17, 2022
b7fd016
if internal_mapping_predicates extract external_ontology from class id
syphax-bouazzouni Dec 17, 2022
769e92f
exrtact ontology mapping only if sub2 is nil
syphax-bouazzouni Sep 26, 2022
44d8506
fix mappings rest tests
syphax-bouazzouni Dec 17, 2022
7fbe880
Merge branch 'pr/refactor/simple-refactor-mapping-ontologies' into pr…
syphax-bouazzouni Dec 17, 2022
7f0e096
Merge remote-tracking branch 'ontoportal/master' into pr/feature/add-…
syphax-bouazzouni Mar 10, 2025
ba9b7a3
update submission extaction to fetch all mod metadata with extraction…
syphax-bouazzouni Mar 10, 2025
72e5f81
add metadata extraction test
syphax-bouazzouni Mar 10, 2025
28e729c
replace URI by uri in the yaml schema file
syphax-bouazzouni Mar 10, 2025
8bb09be
fix URI validation issue on saving
syphax-bouazzouni Mar 20, 2025
84a323f
Merge remote-tracking branch 'ncbo/master' into pr/feature/extract-in…
syphax-bouazzouni May 9, 2025
95e6afa
Merge remote-tracking branch 'ncbo/master' into pr/feature/add-metada…
syphax-bouazzouni May 9, 2025
52b8cfa
added #158
mdorf May 12, 2025
d1c220d
Merge branch 'ontoportal-lirmm-pr/feature/add-metadata-extraction' in…
mdorf May 12, 2025
a171413
commented out some tests for Agent properties
mdorf May 12, 2025
193b791
Merge pull request #158 from ontoportal-lirmm/pr/fix/bring-remaing-ma…
mdorf May 13, 2025
2df7463
resolved the failing test test_submission_parse
mdorf May 15, 2025
af6d584
Revert "added #158"
syphax-bouazzouni May 28, 2025
368f975
fix the new metadata extraction tests
syphax-bouazzouni May 28, 2025
95cb115
pointed to owlapi-wrapper-1.4.3.jar so unit tests pass
mdorf Jun 10, 2025
2d1019f
unit tests pass
mdorf Jun 10, 2025
4ceaa65
tests pass
mdorf Jun 10, 2025
5dcfdd3
fixed unit tests
mdorf Jun 19, 2025
751ac09
Merge branch 'develop' of github.com:ncbo/ontologies_linked_data into…
mdorf Jun 19, 2025
fe963da
fixed an error that was introduced by #158 and returned after the merge
mdorf Jun 20, 2025
ac63d2b
fixed an error that was introduced by #158 and returned after the merge
mdorf Jun 20, 2025
67169f3
change version of the owlapi-wrapper from v1.4.3 to v1.5.0
alexskr Jun 28, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions Gemfile
Original file line number Diff line number Diff line change
Expand Up @@ -36,8 +36,8 @@ group :development do
gem 'rubocop', require: false
end
# NCBO gems (can be from a local dev path or from rubygems/git)
gem 'goo', github: 'ncbo/goo', branch: 'master'
gem 'sparql-client', github: 'ncbo/sparql-client', branch: 'master'
gem 'goo', github: 'ncbo/goo', branch: 'develop'
gem 'sparql-client', github: 'ncbo/sparql-client', branch: 'develop'

gem 'public_suffix', '~> 5.1.1'
gem 'net-imap', '~> 0.4.18'
8 changes: 4 additions & 4 deletions Gemfile.lock
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
GIT
remote: https://github.com/ncbo/goo.git
revision: b9019ad9e1eb78c74105fc6c6a879085066da17d
branch: master
revision: c3f9a7f789bf2f52ed31f0272d2725117d3ac04e
branch: develop
specs:
goo (0.0.2)
addressable (~> 2.8)
Expand All @@ -16,8 +16,8 @@ GIT

GIT
remote: https://github.com/ncbo/sparql-client.git
revision: e89c26aa96f184dbe9b52d51e04fb3d9ba998dbc
branch: master
revision: 1657f0dd69fd4b522d3549a6848670175f5e98cc
branch: develop
specs:
sparql-client (1.0.1)
json_pure (>= 1.4)
Expand Down
Binary file renamed bin/owlapi-wrapper-1.4.2.jar → bin/owlapi-wrapper-1.5.0.jar
100755 → 100644
Binary file not shown.
2 changes: 1 addition & 1 deletion config/schemes/ontology_submission.yml
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ version:
"PAV: The version number of a resource.",
"DOAP: A project release",
"SCHEMA: The version of the CreativeWork embodied by a specified resource."]
extractedMetadata: true
extractedMetadata: false
metadataMappings: [ "omv:version", "mod:version", "owl:versionInfo", "pav:version", "doap:release", "schema:version", "oboInOwl:data-version", "oboInOwl:version" ]

#Status
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,34 +3,283 @@ module Concerns
module OntologySubmission
module MetadataExtractor

def extract_metadata
def extract_metadata(logger = nil, heavy_extraction = true, user_params = nil)
logger ||= Logger.new(STDOUT)
logger.info('Extracting metadata from the ontology submission.')

@submission = self
version_info = extract_version
ontology_iri = extract_ontology_iri
@submission.version = version_info if version_info
@submission.uri = ontology_iri if ontology_iri
@submission.save

self.version = version_info if version_info
self.uri = RDF::URI.new(ontology_iri) if ontology_iri
if heavy_extraction
begin
# Extract metadata directly from the ontology
extract_ontology_metadata(logger, user_params, skip_attrs: [:version, :uri])
logger.info('Additional metadata extracted.')
rescue StandardError => e
e.backtrace
logger.error("Error while extracting additional metadata: #{e}")
end
end

if @submission.valid?
@submission.save
else
logger.error("Error while extracting additional metadata: #{@submission.errors}")
@submission = LinkedData::Models::OntologySubmission.find(@submission.id).first.bring_remaining
end
end

def extract_version

query = Goo.sparql_query_client.select(:versionInfo).distinct
.from(self.id)
.where([RDF::URI.new('http://bioportal.bioontology.org/ontologies/versionSubject'),
RDF::URI.new('http://www.w3.org/2002/07/owl#versionInfo'),
:versionInfo])
.from(@submission.id)
.where([RDF::URI.new('http://bioportal.bioontology.org/ontologies/versionSubject'),
RDF::URI.new('http://www.w3.org/2002/07/owl#versionInfo'),
:versionInfo])

sol = query.each_solution.first || {}
sol[:versionInfo]&.to_s
end

def extract_ontology_iri
query = Goo.sparql_query_client.select(:uri).distinct
.from(self.id)
.from(@submission.id)
.where([:uri,
RDF::URI.new('http://www.w3.org/1999/02/22-rdf-syntax-ns#type'),
RDF::URI.new('http://www.w3.org/2002/07/owl#Ontology')])
sol = query.each_solution.first || {}
sol[:uri]&.to_s
RDF::URI.new(sol[:uri]) if sol[:uri]
end

# Extract additional metadata about the ontology
# First it extracts the main metadata, then the mapped metadata
def extract_ontology_metadata(logger, user_params, skip_attrs: [])
user_params = {} if user_params.nil? || !user_params
ontology_uri = @submission.uri
logger.info("Extraction metadata from ontology #{ontology_uri}")

# go through all OntologySubmission attributes. Returns symbols
LinkedData::Models::OntologySubmission.attributes(:all).each do |attr|
next if skip_attrs.include? attr
# for attribute with the :extractedMetadata setting on, and that have not been defined by the user
attr_settings = LinkedData::Models::OntologySubmission.attribute_settings(attr)

attr_not_excluded = user_params && !(user_params.key?(attr) && !user_params[attr].nil? && !user_params[attr].empty?)

next unless attr_settings[:extractedMetadata] && attr_not_excluded

# a boolean to check if a value that should be single have already been extracted
single_extracted = false
type = enforce?(attr, :list) ? :list : :string
old_value = value(attr, type)

unless attr_settings[:namespace].nil?
property_to_extract = "#{attr_settings[:namespace].to_s}:#{attr.to_s}"
hash_results = extract_each_metadata(ontology_uri, attr, property_to_extract, logger)
single_extracted = send_value(attr, hash_results, logger) unless hash_results.empty?
end

# extracts attribute value from metadata mappings
attr_settings[:metadataMappings] ||= []

attr_settings[:metadataMappings].each do |mapping|
break if single_extracted

hash_mapping_results = extract_each_metadata(ontology_uri, attr, mapping.to_s, logger)
single_extracted = send_value(attr, hash_mapping_results, logger) unless hash_mapping_results.empty?
end

new_value = value(attr, type)

send_value(attr, old_value, logger) if empty_value?(new_value) && !empty_value?(old_value)
end
end

def empty_value?(value)
value.nil? || (value.is_a?(Array) && value.empty?) || value.to_s.strip.empty?
end

def value(attr, type)
val = @submission.send(attr.to_s)
type.eql?(:list) ? Array(val) || [] : val || ''
end

def send_value(attr, new_value, logger)
old_val = nil
single_extracted = false

if enforce?(attr, :list)
old_val = value(attr, :list)
old_values = old_val.dup
new_values = new_value.values
new_values = new_values.map { |v| find_or_create_agent(attr, v, logger) }.compact if enforce?(attr, :Agent)

old_values.push(*new_values)

@submission.send("#{attr}=", old_values.uniq)
elsif enforce?(attr, :concatenate)
# if multiple value for this attribute, then we concatenate it
# Add the concat at the very end, to easily join the content of the array
old_val = value(attr, :string)
metadata_values = old_val.split(', ')
new_values = new_value.values.map { |x| x.to_s.split(', ') }.flatten

@submission.send("#{attr}=", (metadata_values + new_values).uniq.join(', '))
else
new_value = new_value.values.first

new_value = find_or_create_agent(attr, nil, logger) if enforce?(attr, :Agent)

@submission.send("#{attr}=", new_value)
single_extracted = true
end

unless @submission.valid?
logger.error("Error while extracting metadata for the attribute #{attr}: #{@submission.errors[attr] || @submission.errors}")
new_value&.delete if enforce?(attr, :Agent) && new_value.respond_to?(:delete)
@submission.send("#{attr}=", old_val)
end

single_extracted
end

# Return a hash with the best literal value for an URI
# it selects the literal according to their language: no language > english > french > other languages
def select_metadata_literal(metadata_uri, metadata_literal, hash)
return unless metadata_literal.is_a?(RDF::Literal)

if hash.key?(metadata_uri)
if metadata_literal.has_language?
if !hash[metadata_uri].has_language?
return hash
else
case metadata_literal.language
when :en, :eng
# Take the value with english language over other languages
hash[metadata_uri] = metadata_literal
return hash
when :fr, :fre
# If no english, take french
if hash[metadata_uri].language == :en || hash[metadata_uri].language == :eng
return hash
else
hash[metadata_uri] = metadata_literal
return hash
end
else
return hash
end
end
else
# Take the value with no language in priority (considered as a default)
hash[metadata_uri] = metadata_literal
return hash
end
else
hash[metadata_uri] = metadata_literal
hash
end
end

# A function to extract additional metadata
# Take the literal data if the property is pointing to a literal
# If pointing to an URI: first it takes the "omv:name" of the object pointed by the property, if nil it takes the "rdfs:label".
# If not found it check for "omv:firstName + omv:lastName" (for "omv:Person") of this object. And to finish it takes the "URI"
# The hash_results contains the metadataUri (objet pointed on by the metadata property) with the value we are using from it
def extract_each_metadata(ontology_uri, attr, prop_to_extract, logger)

query_metadata = <<eos

SELECT DISTINCT ?extractedObject ?omvname ?omvfirstname ?omvlastname ?rdfslabel
FROM #{@submission.id.to_ntriples}
WHERE {
<#{ontology_uri}> #{prop_to_extract} ?extractedObject .
OPTIONAL { ?extractedObject omv:name ?omvname } .
OPTIONAL { ?extractedObject omv:firstName ?omvfirstname } .
OPTIONAL { ?extractedObject omv:lastName ?omvlastname } .
OPTIONAL { ?extractedObject rdfs:label ?rdfslabel } .
}
eos
Goo.namespaces.each do |prefix, uri|
query_metadata = "PREFIX #{prefix}: <#{uri}>\n" + query_metadata
end

# logger.info(query_metadata)
# This hash will contain the "literal" metadata for each object (uri or literal) pointed by the metadata predicate
hash_results = {}
Goo.sparql_query_client.query(query_metadata).each_solution do |sol|
value = sol[:extractedObject]
if enforce?(attr, :uri)
# If the attr is enforced as URI then it directly takes the URI
uri_value = value ? RDF::URI.new(value.to_s.strip) : nil
hash_results[value] = uri_value if uri_value&.valid?
elsif enforce?(attr, :date_time)
begin
hash_results[value] = DateTime.iso8601(value.to_s)
rescue StandardError => e
logger.error("Impossible to extract DateTime metadata for #{attr}: #{value}. It should follow iso8601 standards. Error message: #{e}")
end
elsif enforce?(attr, :integer)
begin
hash_results[value] = value.to_s.to_i
rescue StandardError => e
logger.error("Impossible to extract integer metadata for #{attr}: #{value}. Error message: #{e}")
end
elsif enforce?(attr, :boolean)
case value.to_s.downcase
when 'true'
hash_results[value] = true
when 'false'
hash_results[value] = false
else
logger.error("Impossible to extract boolean metadata for #{attr}: #{value}. Error message: #{e}")
end
elsif value.is_a?(RDF::URI)
hash_results = find_object_label(hash_results, sol, value)
else
# If this is directly a literal
hash_results = select_metadata_literal(value, value, hash_results)
end
end
hash_results
end

def find_object_label(hash_results, sol, value)
if !sol[:omvname].nil?
hash_results = select_metadata_literal(value, sol[:omvname], hash_results)
elsif !sol[:rdfslabel].nil?
hash_results = select_metadata_literal(value, sol[:rdfslabel], hash_results)
elsif !sol[:omvfirstname].nil?
hash_results = select_metadata_literal(value, sol[:omvfirstname], hash_results)
# if first and last name are defined (for omv:Person)
hash_results[value] = "#{hash_results[value]} #{sol[:omvlastname]}" unless sol[:omvlastname].nil?
elsif !sol[:omvlastname].nil?
# if only last name is defined
hash_results = select_metadata_literal(value, sol[:omvlastname], hash_results)
else
# if the object is an URI but we are requesting a String
hash_results[value] = value.to_s
end
hash_results
end

def enforce?(attr, type)
LinkedData::Models::OntologySubmission.attribute_settings(attr)[:enforce].include?(type)
end

def find_or_create_agent(attr, old_val, logger)
agent = LinkedData::Models::Agent.where(agentType: 'person', name: old_val).first
begin
agent ||= LinkedData::Models::Agent.new(name: old_val, agentType: 'person', creator: @submission.ontology.administeredBy.first).save
rescue
logger.error("Error while extracting metadata for the attribute #{attr}: Can't create Agent #{agent.errors} ")
agent = nil
end
agent
end
end
end
Expand Down
Loading