Skip to content

Duplicate Samples in Dev Set #32

@luomancs

Description

@luomancs

Hi,
I found that there were duplicated samples in the development set of Natural Question: the context and the question are exactly the same in two examples. For example, the question 6357c3655b524feb8d0e398ff61dfabf and the question 44e059927ac841d489d580a29222683b are the same! if remove duplicated questions, the NQ development set reduce from 12836 to 5529 examples. Could you please check if my finding is true or I miss something? Thank you.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions