-
Notifications
You must be signed in to change notification settings - Fork 277
Add sysreq doc #1163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add sysreq doc #1163
Changes from all commits
de1f945
e43c111
9d09c4e
94d74c1
5bf8e5c
29feb33
41363f9
b44107f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,91 @@ | ||
| # NV-Ingest resource scaling modes | ||
|
|
||
| How NV-Ingest scales work across stages, and how to configure it with docker-compose. | ||
|
|
||
| - **Static scaling**: Each pipeline stage runs a fixed number of replicas based on heuristics (memory-aware). Good for consistent latency; higher steady-state memory usage. | ||
| - **Dynamic scaling**: Only the source stage is fixed; other stages scale up/down based on observed resource pressure. Better memory efficiency; may briefly pause to spin replicas back up after idle periods. | ||
|
|
||
| ## When to choose which | ||
|
|
||
| - **Choose Static** when latency consistency and warm pipelines matter more than memory minimization. | ||
| - **Choose Dynamic** when memory headroom is constrained or workloads are bursty/idle for long periods. | ||
|
|
||
| ## Configure (docker-compose) | ||
|
|
||
| Edit `services > nv-ingest-ms-runtime > environment` in `docker-compose.yaml`. | ||
|
|
||
| ### Select mode | ||
|
|
||
| - **Dynamic (default)** | ||
| - `INGEST_DISABLE_DYNAMIC_SCALING=false` | ||
| - `INGEST_DYNAMIC_MEMORY_THRESHOLD=0.80` (fraction of memory; worker scaling reacts around this level) | ||
|
|
||
| - **Static** | ||
| - `INGEST_DISABLE_DYNAMIC_SCALING=true` | ||
| - Optionally set a static memory threshold: | ||
| - `INGEST_STATIC_MEMORY_THRESHOLD=0.85` (fraction of total memory reserved for static replicas) | ||
|
|
||
| Example (Static): | ||
|
|
||
| ```yaml | ||
| services: | ||
| nv-ingest-ms-runtime: | ||
| environment: | ||
| - INGEST_DISABLE_DYNAMIC_SCALING=true | ||
| - INGEST_STATIC_MEMORY_THRESHOLD=0.85 | ||
| ``` | ||
|
|
||
| Example (Dynamic): | ||
|
|
||
| ```yaml | ||
| services: | ||
| nv-ingest-ms-runtime: | ||
| environment: | ||
| - INGEST_DISABLE_DYNAMIC_SCALING=false | ||
| - INGEST_DYNAMIC_MEMORY_THRESHOLD=0.80 | ||
| ``` | ||
|
|
||
| ### Pipeline config mapping | ||
|
|
||
| - `pipeline.disable_dynamic_scaling` ⇐ `INGEST_DISABLE_DYNAMIC_SCALING` | ||
| - `pipeline.dynamic_memory_threshold` ⇐ `INGEST_DYNAMIC_MEMORY_THRESHOLD` | ||
| - `pipeline.static_memory_threshold` ⇐ `INGEST_STATIC_MEMORY_THRESHOLD` | ||
|
|
||
| ## Trade-offs recap | ||
|
|
||
| - **Dynamic** | ||
| - Pros: Better memory efficiency; stages scale down when idle; can force scale-down under spikes. | ||
| - Cons: After long idle, stages may scale to 0 replicas causing brief warm-up latency when work resumes. | ||
|
|
||
| - **Static** | ||
| - Pros: Stable, predictable latency; stages remain hot. | ||
| - Cons: Higher baseline memory usage over time. | ||
|
|
||
| ## Sources of memory utilization | ||
|
|
||
| - **Workload size and concurrency** | ||
| - More in‑flight jobs create more objects (pages, images, tables, charts) and large artifacts (for example, embeddings). | ||
| - Example: 1 MB text file → paragraphs with 20% overlap → 4k‑dim embeddings base64‑encoded to JSON | ||
| - Assumptions: ~600 bytes per paragraph. 20% overlap ⇒ effective step ≈ 480 bytes. Chunks ≈ 1,000,000 / 480 ≈ 2,083. | ||
| - Per‑embedding size: 4,096 dims × 4 bytes (float32) = 16,384 bytes; base64 expansion × 4/3 ≈ 21,845 bytes (≈21.3 KB). | ||
| - Total embeddings payload: ≈ 2,083 × 21.3 KB ≈ 45 MB, excluding JSON keys/metadata. | ||
| - Takeaway: a 1 MB source can yield ≳40× memory just for embeddings, before adding extracted text, images, or other artifacts. | ||
| - Example: PDF rendering and extracted images (A4 @ 72 DPI) | ||
| - Rendering a page is a large in‑memory buffer; each extracted sub‑image adds more, and base64 inflates size. | ||
| - Page pixels ≈ 8.27×72 by 11.69×72 ≈ 595×842 ≈ 0.50 MP. | ||
| - RGB (3 bytes/pixel) ≈ 1.5 MB per page buffer; RGBA (4 bytes/pixel) ≈ 2.0 MB. | ||
| - Ten 1024×1024 RGB crops ≈ 3.0 MB each in memory → base64 (+33%) ≈ 4.0 MB each ⇒ ~40 MB just for crops (JSON not included). | ||
| - If you also base64 the full page image, expect another ~33% over the raw byte size (compression varies by format). | ||
| - **Library behavior** | ||
| - Components like PyArrow may retain memory longer than expected (delayed free). | ||
| - **Queues and payloads** | ||
| - Base64‑encoded, fragmented documents in Redis consume memory proportional to concurrent jobs, clients, and drain speed. | ||
|
|
||
| ## Where to look in docker-compose | ||
|
|
||
| Open `docker-compose.yaml` and locate: | ||
|
|
||
| - `services > nv-ingest-ms-runtime > environment`: | ||
| - `INGEST_DISABLE_DYNAMIC_SCALING` | ||
| - `INGEST_DYNAMIC_MEMORY_THRESHOLD` | ||
| - `INGEST_STATIC_MEMORY_THRESHOLD` |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,35 @@ | ||
| # System Requirements | ||
|
|
||
| ## Hardware Requirements | ||
|
|
||
| The NV-Ingest full ingestion pipeline is designed to consume significant CPU and memory resources to achieve maximal parallelism. Resource usage will scale up to the limits of your deployed system. | ||
|
|
||
| ### Recommended Production Deployment Specifications | ||
|
|
||
| - **System Memory**: At least 256 GB RAM | ||
| - **CPU Cores**: At least 32 CPU cores | ||
| - **GPU**: NVIDIA GPU with at least 24 GB VRAM (e.g., A100, V100, or equivalent) | ||
|
|
||
| **Note:** Using less powerful systems or lower resource limits is still viable, but performance will suffer. | ||
|
|
||
| ### Resource Consumption Notes | ||
|
|
||
| - The pipeline performs runtime allocation of parallel resources based on system configuration | ||
| - Memory usage can reach up to the full system capacity for large document processing | ||
| - CPU utilization scales with the number of concurrent processing tasks | ||
| - GPU is required for image processing NIMs, embeddings, and other GPU-accelerated tasks | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Think there should be an additional section at the bottom, but xlinked here. We should say why the CPU and mem requirements are high- Something like:
Can be followup, but needs to tell the user the tl;dr of why we use so many resources.
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please also link to whatever public materials we have on DC767 - @sosahi will have this |
||
|
|
||
| ### Scaling Considerations | ||
|
|
||
| For production deployments processing large volumes of documents, consider: | ||
| - Higher memory configurations for processing large PDF files or image collections | ||
| - Additional CPU cores for improved parallel processing | ||
| - Multiple GPUs for distributed processing workloads | ||
|
|
||
| For guidance on choosing between static and dynamic scaling modes, and how to configure them in `docker-compose.yaml`, see: | ||
|
|
||
| - [Scaling Modes](./scaling_modes.md) | ||
|
|
||
| ### Environment Requirements | ||
|
|
||
| Ensure your deployment environment meets these specifications before running the full NV-Ingest pipeline. Resource-constrained environments may experience performance degradation. | ||
Uh oh!
There was an error while loading. Please reload this page.