@201 Add RO-Crate entity filters for targeted lookups#273
Conversation
| @@ -0,0 +1,112 @@ | |||
| # RO-Crate Filter Staged Review | |||
| configuration, | ||
| workflow_id, | ||
| Some(0), | ||
| Some(2), |
There was a problem hiding this comment.
so if for whatever reason multiple docs wiith the same file_id exist we can log a warning
| { | ||
| Ok(result) => result, | ||
| Err(e) => { | ||
| let error_string = e.to_string(); |
There was a problem hiding this comment.
its not an explicit need, just covers an error case. This bit of code just converts the error message into something more human readable
| "message": message | ||
| })); | ||
| return Ok( | ||
| CreateRoCrateEntityResponse::UnprocessableContentErrorResponse( |
There was a problem hiding this comment.
Should the same error be applied to the update-crate function?
| @@ -0,0 +1,13 @@ | |||
| DELETE FROM ro_crate_entity | |||
There was a problem hiding this comment.
What is going here? Deleting rows without warning?
There was a problem hiding this comment.
it is just there to de-duplicate rows that have thee same (workflow_id, file_id) so the index creation below runs successfully. No information is lost; if there are two rows with the same file_id, both rows point to the same file.
| ("RO-Crate entity already exists", "unknown") | ||
| }; | ||
|
|
||
| warn!( |
There was a problem hiding this comment.
From Claude:
Log format inconsistency. src/server/api/ro_crate.rs:390 uses file_id={:?} for
Option<i64>, yielding file_id=Some(123) or file_id=None. CLAUDE.md asks for file_id=123 style for parser scripts. Easy fix: split into two log statements based on body.file_id.is_some(),
or format as file_id={} after unwrap_or(-1) + a separate has_file=true/false.
Summary
This MR fixes the RO-Crate existence-check path by extending the existing
GET /workflows/{id}/ro_crate_entitiesAPI with optional filters for:file_identity_idInstead of repeatedly fetching the full RO-Crate entity list and scanning it
client-side, callers can now perform targeted lookups against the server.
Problem
The previous implementation relied on listing all RO-Crate entities for a
workflow and then filtering in client code to determine whether a matching
record already existed.
That has two problems:
What Changed
Server / API
file_idandentity_idquery parameters toGET /workflows/{id}/ro_crate_entitiesand RO-Crate transport
consistent
Client
list_ro_crate_entities_with_filtersfind_ro_crate_entity_by_file_idfind_ro_crate_entity_by_entity_idfind_entity_for_file()to use a targeted filtered lookup instead offull-list fetch + scan
find_*helpers observe multiple matchesand return the first result
Data Integrity
(workflow_id, file_id)wherefile_id IS NOT NULL(workflow_id, file_id)rowsbefore creating the unique index
Error Handling
structured
422response instead of falling through as a generic server errorTests
Added and strengthened RO-Crate coverage for:
file_identity_id422assertionValidation
Ran successfully:
cargo fmt --all --checkcargo check --testscargo clippy --all --all-targets --all-features -- -D warningsdprint checkNote: full integration test execution is still limited in this environment
because sandboxed test server port binding is restricted.
Notes / Risk
(workflow_id, file_id)rows already exist, it keeps the newest row beforeadding the unique index.
any out-of-tree implementors will need to update accordingly.