Skip to content

Conversation

yarikoptic
Copy link
Member

@yarikoptic yarikoptic commented Aug 22, 2024

A design doc composed with @djarecka to avoid dummy DOIs for dandisets

refs:

TODOs

  • complete initial pass
  • seek review
  • make explicit that DOI generation is optional overall, since LINC does not even need it (ref)
  • potentially add a sequence diagram of interactions between user, archive, datacite across different stages of embargoed -> public dandisets

but could already be checked out by @dandi/archive-maintainers folks since overall idea is formulated already and some early concerns/questions could already be asked/answered

@djarecka
Copy link
Member

djarecka commented Jan 19, 2025

I created some test to simulate the workflow in dandi/dandi-schema#275

- We might want a dedicated 404 page for deleted dandisets, or at least a message that the dandiset was deleted, and ideally describe the reason why it was deleted ("Upon request of maintainer", "Due to violation of terms of service", etc.)
- Then we adjust DOI record to point to that page.

- Should we do anything at dandischema level?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@djarecka we are to answer those I think before we could call this "done" ;-)

- Upon changes to a non-embargoed, draft dandiset metadata record:
- If `Draft DOI`, attempt to "promote" it to `Findable`.
- If validation fails - keep `Draft DOI` (very limited validation), attempt to update datacite metadata record while keeping the same target URL.
- **Question to clear up**: what happens to `Draft DOI` if metadata record is invalid? It seems to create one with no metadata, but does it update only the fields it knows about?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@djarecka I feel like you clarified on this but we did not put it "in writing" here. What do you remember on this aspect?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping @djarecka

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, sorry, i just gave "thumbs up", since you were right, Draft DOI doesn't have to have metadata, it can only have url, and if has more, it will be added

@asmacdo asmacdo mentioned this pull request Apr 22, 2025
19 tasks

A django-admin script should be created and executed to create a `Dandiset DOI` for all existing dandisets.

**Question to address**: Will adding a `Dandiset DOI` in addition to `Version DOI` require a db migration?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the POC I've added a doi field to the Dandiset model which does add a migration.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't there also a "draft" Version with DOI (and that's where I guess we inject a fake one), i.e. could we avoid changing DB model?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to admit that I "just dont like that" since the Dandiset DOI semantically belongs to the Dandiset. I cant think of a reason this wouldnt work, but just feels messy.

To retrieve the Dandiset DOI via Django, someversion.dandiset.draft_version.doi IMO violates the "principle of least surprise".

Prior to publication the Dandiset DOI will point to the draft version (via the DLP) but after publication the dandiset DOI will point to the latest publication, so that would also be surprising

- **Follow up concern**: after dandiset and DOI publish, metadata of the Draft version of the dandiset could still be changed.
This potentially making changed record again "invalid".
Should be Ok'ish
- Test site of datacite had different result of validation that the primary one
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any more information about this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might have been a bad memory, @djarecka do you have any information on this or should we just remove this?

Suggested change
- Test site of datacite had different result of validation that the primary one

Copy link
Member

@djarecka djarecka Apr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's remove it. It was based on my memory of some old issue, but wasn't able to reproduce, or find old records.

@asmacdo
Copy link
Member

asmacdo commented Apr 24, 2025

For dandi-schema, we may also want to pull the validation out of to_datacite. This validation does not use the datacite API, rather validation occurs against the schema which has been pre-fetched and committed into the dandi-schema repo.

Currently to_datacite accepts an optional arg validate, which defaults to False-- the only use of to_datacite in dandi-archive DOES NOT enable validation!. I'm curious what problems this could cause, and if those problems have actually occurred-- maybe there are more Dandisets with DOIs stored in the data model without a DOI minted? Or maybe they were created but set to Draft due to validation errors? We would have logged exceptions if the datacite API did not accept a new DOI.

From this design doc only, I gather that we believe Draft DOIs have less stringent validation than Findable, but I have not found any upstream documentation that confirms this. If it is the case, I suggest we pull the validation out of the to_datacite function, which would only be responsible for constructing the API payload. Then, in dandi-archive, we perform a validation. If valid we publish a Findable DOI, and if invalid we fallback to Draft DOI if it fails validation.

Either way, we need to consider using some kind of validation. @djarecka @yarikoptic I suggest this as topic for our discussion tomorrow.

@djarecka
Copy link
Member

For dandi-schema, we may also want to pull the validation out of to_datacite. This validation does not use the datacite API, rather validation occurs against the schema which has been pre-fetched and committed into the dandi-schema repo.

correct!

Currently to_datacite accepts an optional arg validate, which defaults to False-- the only use of to_datacite in dandi-archive DOES NOT enable validation!. I'm curious what problems this could cause, and if those problems have actually occurred-- maybe there are more Dandisets with DOIs stored in the data model without a DOI minted? Or maybe they were created but set to Draft due to validation errors? We would have logged exceptions if the datacite API did not accept a new DOI.

That's probably not good, if the validation is not used before publishing. Was not aware.

From this design doc only, I gather that we believe Draft DOIs have less stringent validation than Findable, but I have not found any upstream documentation that confirms this. If it is the case, I suggest we pull the validation out of the to_datacite function, which would only be responsible for constructing the API payload. Then, in dandi-archive, we perform a validation. If valid we publish a Findable DOI, and if invalid we fallback to Draft DOI if it fails validation.

Yes, no documentation, but that's the case...

## Concerns to keep in mind/address

- **Question to clear up**: what happens to `Draft DOI` if metadata record is invalid?
- It seems to create one with no metadata, but does it update only the fields it knows about?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@djarecka did you check in your experiments?

@yarikoptic
Copy link
Member Author

@asmacdo -- do you think this one is good enough and reflects everything you learned from implementing #2350 ? then we likely should just merge it. Any objections? @djarecka what are you thoughts?

@asmacdo
Copy link
Member

asmacdo commented Jun 4, 2025

@yarikoptic I think its good enough to merge. There are remaining "unknowns" relating to how datacite will respond with requests for Findable DOIs that might not have fully valid metadata, which will hopefully be illuminated by incorporating NSKEY so I can create permanent Findable DOIs that will not cause collisions. Whatever comes of that, I think its fine to open a new PR to the design doc if necessary.

@yarikoptic yarikoptic added documentation Changes only affect the documentation design-doc Involves creating or discussing a design document labels Jun 5, 2025
@yarikoptic yarikoptic enabled auto-merge June 5, 2025 18:21
@yarikoptic yarikoptic disabled auto-merge June 5, 2025 18:21
@yarikoptic yarikoptic enabled auto-merge June 5, 2025 18:22
@yarikoptic
Copy link
Member Author

@asmacdo then please approve this PR "officially" -- I can't since I am the author. I have enabled auto-merge though ;)

Copy link
Member

@asmacdo asmacdo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, forgot this wasnt mine :)

@yarikoptic yarikoptic merged commit 3a9a346 into master Jun 5, 2025
7 of 11 checks passed
@yarikoptic yarikoptic deleted the enh-doi-draft branch June 5, 2025 18:24
@dandibot
Copy link
Member

dandibot commented Jun 5, 2025

🚀 PR was released in v0.11.1 🚀

@dandibot dandibot added the released This issue/pull request has been released. label Jun 5, 2025
asmacdo added a commit to asmacdo/dandi-archive that referenced this pull request Jun 16, 2025
- Dandiset DOI will redirect to the DLP
- Example: 10.80507/dandi.000004
- Dandiset DOI is stored in the doi field of the draft version
- Dandiset DOI metadata (on Datacite) will match the draft version until
  first publication
- Once a Dandiset is published, the Dandiset DOI metadata will match the
  latest publication

See the design document for more details: dandi#2012
jjnesbitt pushed a commit to asmacdo/dandi-archive that referenced this pull request Sep 3, 2025
- Dandiset DOI will redirect to the DLP
- Example: 10.80507/dandi.000004
- Dandiset DOI is stored in the doi field of the draft version
- Dandiset DOI metadata (on Datacite) will match the draft version until
  first publication
- Once a Dandiset is published, the Dandiset DOI metadata will match the
  latest publication

See the design document for more details: dandi#2012
jjnesbitt pushed a commit to asmacdo/dandi-archive that referenced this pull request Sep 4, 2025
- Dandiset DOI will redirect to the DLP
- Example: 10.80507/dandi.000004
- Dandiset DOI is stored in the doi field of the draft version
- Dandiset DOI metadata (on Datacite) will match the draft version until
  first publication
- Once a Dandiset is published, the Dandiset DOI metadata will match the
  latest publication

See the design document for more details: dandi#2012
jjnesbitt pushed a commit to asmacdo/dandi-archive that referenced this pull request Sep 4, 2025
- Dandiset DOI will redirect to the DLP
- Example: 10.80507/dandi.000004
- Dandiset DOI is stored in the doi field of the draft version
- Dandiset DOI metadata (on Datacite) will match the draft version until
  first publication
- Once a Dandiset is published, the Dandiset DOI metadata will match the
  latest publication

See the design document for more details: dandi#2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design-doc Involves creating or discussing a design document documentation Changes only affect the documentation released This issue/pull request has been released.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants