-
Notifications
You must be signed in to change notification settings - Fork 276
Add sysreq doc #1163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add sysreq doc #1163
Conversation
| - The pipeline performs runtime allocation of parallel resources based on system configuration | ||
| - Memory usage can reach up to the full system capacity for large document processing | ||
| - CPU utilization scales with the number of concurrent processing tasks | ||
| - GPU is required for image processing NIMs, embeddings, and other GPU-accelerated tasks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think there should be an additional section at the bottom, but xlinked here.
We should say why the CPU and mem requirements are high-
Something like:
For a representative set of 1000 PDFs, NV-Ingest renders 54,000 jpeg images, one per PDF page. We extract on average N sub-page jpegs (one each per table, chart, header, footer, section title, and text paragraphs). Downstream of each content type, we extract smaller bounding boxed jpegs for every chart element and every table cell (hundreds to thousands per table).
Can be followup, but needs to tell the user the tl;dr of why we use so many resources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also link to whatever public materials we have on DC767 - @sosahi will have this
Co-authored-by: Randy Gelhausen <[email protected]>
…qs' into devin_doc_add_sysreqs
Checklist