Skip to content

<doi>+<doi> is a common pattern #4

@halfak

Description

@halfak

Another type of failure I see is looks like this: 10.1086/591526+10.1088/0004-637X/706/1/L203

I'm not sure how we'd be able to tell that a "+" is not part of the DOI.

When I search for this exact string, I found this listing: http://arxiv.org/abs/0805.4758 It seems that both DOIs are associated with the same paper. One of the paper itself and another is an errata for the paper!

I'm thinking that we might get high fitness by having a special rule in the parser for splitting characters like "+&?". If we see them right before some whitespace or a new DOI_START, then stop reading the DOI.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions