Skip to content

Bug: page_indices in extraction results always starts at [0] instead of preserving document position #136

@mmiakashs

Description

@mmiakashs

Problem

The page_indices field in extraction service output (split_document.page_indices) incorrectly always starts at [0, 1, 2, ...] for each section, regardless of which pages from the original document packet the section contains.

Current Behavior

  • Section with pages [5, 6, 7] outputs page_indices: [0, 1, 2]
  • Section with pages [10, 11, 12] outputs page_indices: [0, 1, 2]

Expected Behavior

  • Section with pages [5, 6, 7] should output page_indices: [4, 5, 6]
  • Section with pages [10, 11, 12] should output page_indices: [9, 10, 11]
Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions