Skip to content

[FEATURE] Reference Authenticity & Retraction Check (Lite) #4

@HzaCode

Description

@HzaCode

Problem

AI-generated or user-supplied citations can be fake, non-resolving, or retracted.
OneCite currently cleans and normalizes references but does not verify authenticity.
Users need a quick way to ensure their references are real and safe to cite.

Goal

Provide a lightweight verification step that confirms each reference:

  • Exists (DOI or URL resolves),
  • Comes from a legitimate venue/domain,
  • Is not retracted, withdrawn, or flagged,
    and returns a simple pass/fail with minimal evidence.

Out of Scope (v1)

  • PDF provenance or content forensics
  • Citation quality or impact scoring
  • Complex confidence weighting

Proposed Approach (v1)

  • Data sources: Crossref (required), PubMed (optional for biomedical), OpenAlex (supplement), Unpaywall (OA domain sanity).

  • Checks:

    1. DOI/URL resolution
    2. Crossref relations or updates (retraction, withdrawal, correction, expression of concern)
    3. Optional PubMed pubtype verification
    4. Optional OpenAlex is_retracted flag
  • Decision: status = valid | needs_review | blocked

  • Output: simple JSON report + optional filtered .bib for valid entries.

UX (CLI & API)

CLI

onecite verify refs.txt \
  --report verify.json \
  --emit-bib good.bib \
  --fail-on blocked

Python

from onecite.verify import verify_references
report = verify_references(items, fail_on="blocked")

Minimal Rules (v1)

  • R1 DOI resolves (2xx/3xx) to a known publisher → else blocked
  • R2 Crossref shows retraction/withdrawal/EoC → blocked; correction/erratum → needs_review
  • R3 (optional) PubMed pubtype includes retraction → blocked

Acceptance Criteria

  • Retracted DOI → status=blocked, CLI exits non-zero when --fail-on blocked.
  • Valid DOI → status=valid, includes publisher URL in report.
  • Unmatched title → status=needs_review, excluded from good.bib.

Open Questions

  • Default behavior for needs_review (warn or fail)?
  • Maintain built-in trusted domain list or fetch dynamically?
  • Default data sources enabled?

Next Steps

  • Implement onecite verify skeleton using Crossref.
  • Define minimal JSON schema for verification results.
  • Add optional PubMed and OpenAlex adapters later.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions