Skip to content

Conversation

@jamesbraza
Copy link
Collaborator

Since all pages are concurrently read, this is a bit bursty in memory usage. This PR adds an optional concurrency limit to limit the burst on a page-basis.

@jamesbraza jamesbraza self-assigned this Nov 22, 2025
@jamesbraza jamesbraza added the enhancement New feature or request label Nov 22, 2025
Copilot AI review requested due to automatic review settings November 22, 2025 18:58
@dosubot dosubot bot added the size:S This PR changes 10-29 lines, ignoring generated files. label Nov 22, 2025
@dosubot
Copy link

dosubot bot commented Nov 22, 2025

Related Documentation

Checked 1 published document(s) in 1 knowledge base(s). No updates required.

How did I do? Any feedback?  Join Discord

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an optional concurrency parameter to the Nemotron PDF reader to control memory usage during page processing. Since all pages are currently processed concurrently, this can cause memory bursts. The new parameter allows users to limit concurrency on a page-by-page basis.

  • Added concurrency parameter to parse_pdf_to_pages function accepting int | asyncio.Semaphore | None
  • Implemented conditional logic to use gather_with_concurrency when concurrency limit is specified, otherwise use asyncio.gather
  • Enhanced NemotronLengthError in the Nvidia API path to include response details for debugging

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
packages/paper-qa-nemotron/src/paperqa_nemotron/reader.py Added optional concurrency parameter to control concurrent page processing and prevent memory bursts
packages/paper-qa-nemotron/src/paperqa_nemotron/api.py Enhanced NemotronLengthError exception to include response choice data for caller debugging

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Nov 22, 2025
@jamesbraza jamesbraza changed the title Opt-in for concurrency on Nemotron PDF reader Concurrency on Nemotron PDF reader Nov 22, 2025
@jamesbraza jamesbraza changed the title Concurrency on Nemotron PDF reader Concurrency limits for Nemotron PDF reader Nov 22, 2025
@jamesbraza jamesbraza merged commit 437bbf6 into main Nov 22, 2025
12 of 14 checks passed
@jamesbraza jamesbraza deleted the nemotron-concurrency branch November 22, 2025 19:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request lgtm This PR has been approved by a maintainer size:S This PR changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants