Skip to content

refs field in body_text is empty when using --json parameter #93

@jjbes

Description

@jjbes

Command Used

grobid_client --input ./artifacts/pdf --output ./artifacts/tei --json --n 10 processFulltextDocument

Environment

  • Client: grobid-client-python v0.1.0

Example Files

2021.naacl-main.224.json
2021.naacl-main.224.grobid.tei.xml

Observation

In the generated JSON output, the refs field within each body_text section is empty.

Example:

<div xmlns="http://www.tei-c.org/ns/1.0">
<head n="1">Introduction</head>
<p> Natural Language Inference (NLI) is the task [...] <ref type="bibr" target="#b5">(Dagan et al., 2013)</ref> [...] <ref type="bibr" target="#b6">(Devlin et al., 2019)</ref></p>
</div>
{
  "id": 0,
  "text": "Natural Language Inference (NLI) is the task [...]",
  "coords": [],
  "refs": []
}

Expected Behavior

Each body_text section’s refs field should include reference information (e.g., citation markers or linked bibliography entries).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions