NVIDIA · drobison00 · Dec 1, 2025 · Nov 17, 2025 · Nov 17, 2025 · Nov 18, 2025
@@ -0,0 +1,91 @@
+# NV-Ingest resource scaling modes
+
+How NV-Ingest scales work across stages, and how to configure it with docker-compose.
+
+- **Static scaling**: Each pipeline stage runs a fixed number of replicas based on heuristics (memory-aware). Good for consistent latency; higher steady-state memory usage.
+- **Dynamic scaling**: Only the source stage is fixed; other stages scale up/down based on observed resource pressure. Better memory efficiency; may briefly pause to spin replicas back up after idle periods.
+
+## When to choose which
+
+- **Choose Static** when latency consistency and warm pipelines matter more than memory minimization.
+- **Choose Dynamic** when memory headroom is constrained or workloads are bursty/idle for long periods.
+
+## Configure (docker-compose)
+
+Edit `services > nv-ingest-ms-runtime > environment` in `docker-compose.yaml`.
+
+### Select mode
+
+- **Dynamic (default)**
+  - `INGEST_DISABLE_DYNAMIC_SCALING=false`
+  - `INGEST_DYNAMIC_MEMORY_THRESHOLD=0.80` (fraction of memory; worker scaling reacts around this level)
+
+- **Static**
+  - `INGEST_DISABLE_DYNAMIC_SCALING=true`
+  - Optionally set a static memory threshold:
+    - `INGEST_STATIC_MEMORY_THRESHOLD=0.85` (fraction of total memory reserved for static replicas)
+
+Example (Static):
+
+```yaml
+services:
+  nv-ingest-ms-runtime:
+    environment:
+      - INGEST_DISABLE_DYNAMIC_SCALING=true
+      - INGEST_STATIC_MEMORY_THRESHOLD=0.85
+```
+
+Example (Dynamic):
+
+```yaml
+services:
+  nv-ingest-ms-runtime:
+    environment:
+      - INGEST_DISABLE_DYNAMIC_SCALING=false
+      - INGEST_DYNAMIC_MEMORY_THRESHOLD=0.80
+```
+
+### Pipeline config mapping
+
+- `pipeline.disable_dynamic_scaling` ⇐ `INGEST_DISABLE_DYNAMIC_SCALING`
+- `pipeline.dynamic_memory_threshold` ⇐ `INGEST_DYNAMIC_MEMORY_THRESHOLD`
+- `pipeline.static_memory_threshold` ⇐ `INGEST_STATIC_MEMORY_THRESHOLD`
+
+## Trade-offs recap
+
+- **Dynamic**
+  - Pros: Better memory efficiency; stages scale down when idle; can force scale-down under spikes.
+  - Cons: After long idle, stages may scale to 0 replicas causing brief warm-up latency when work resumes.
+
+- **Static**
+  - Pros: Stable, predictable latency; stages remain hot.
+  - Cons: Higher baseline memory usage over time.
+
+## Sources of memory utilization
+
+- **Workload size and concurrency**
+  - More in‑flight jobs create more objects (pages, images, tables, charts) and large artifacts (for example, embeddings).
+  - Example: 1 MB text file → paragraphs with 20% overlap → 4k‑dim embeddings base64‑encoded to JSON
+    - Assumptions: ~600 bytes per paragraph. 20% overlap ⇒ effective step ≈ 480 bytes. Chunks ≈ 1,000,000 / 480 ≈ 2,083.
+    - Per‑embedding size: 4,096 dims × 4 bytes (float32) = 16,384 bytes; base64 expansion × 4/3 ≈ 21,845 bytes (≈21.3 KB).
+    - Total embeddings payload: ≈ 2,083 × 21.3 KB ≈ 45 MB, excluding JSON keys/metadata.
+    - Takeaway: a 1 MB source can yield ≳40× memory just for embeddings, before adding extracted text, images, or other artifacts.
+  - Example: PDF rendering and extracted images (A4 @ 72 DPI)
+    - Rendering a page is a large in‑memory buffer; each extracted sub‑image adds more, and base64 inflates size.
+    - Page pixels ≈ 8.27×72 by 11.69×72 ≈ 595×842 ≈ 0.50 MP.
+    - RGB (3 bytes/pixel) ≈ 1.5 MB per page buffer; RGBA (4 bytes/pixel) ≈ 2.0 MB.
+    - Ten 1024×1024 RGB crops ≈ 3.0 MB each in memory → base64 (+33%) ≈ 4.0 MB each ⇒ ~40 MB just for crops (JSON not included).
+    - If you also base64 the full page image, expect another ~33% over the raw byte size (compression varies by format).
+- **Library behavior**
+  - Components like PyArrow may retain memory longer than expected (delayed free).
+- **Queues and payloads**
+  - Base64‑encoded, fragmented documents in Redis consume memory proportional to concurrent jobs, clients, and drain speed.
+
+## Where to look in docker-compose
+
+Open `docker-compose.yaml` and locate:
+
+- `services > nv-ingest-ms-runtime > environment`:
+  - `INGEST_DISABLE_DYNAMIC_SCALING`
+  - `INGEST_DYNAMIC_MEMORY_THRESHOLD`
+  - `INGEST_STATIC_MEMORY_THRESHOLD`
@@ -0,0 +1,35 @@
+# System Requirements
+
+## Hardware Requirements
+
+The NV-Ingest full ingestion pipeline is designed to consume significant CPU and memory resources to achieve maximal parallelism. Resource usage will scale up to the limits of your deployed system.
+
+### Recommended Production Deployment Specifications
+
+- **System Memory**: At least 256 GB RAM
+- **CPU Cores**: At least 32 CPU cores
+- **GPU**: NVIDIA GPU with at least 24 GB VRAM (e.g., A100, V100, or equivalent)
+
+**Note:** Using less powerful systems or lower resource limits is still viable, but performance will suffer.
+
+### Resource Consumption Notes
+
+- The pipeline performs runtime allocation of parallel resources based on system configuration
+- Memory usage can reach up to the full system capacity for large document processing
+- CPU utilization scales with the number of concurrent processing tasks
+- GPU is required for image processing NIMs, embeddings, and other GPU-accelerated tasks
+
+### Scaling Considerations
+
+For production deployments processing large volumes of documents, consider:
+- Higher memory configurations for processing large PDF files or image collections
+- Additional CPU cores for improved parallel processing
+- Multiple GPUs for distributed processing workloads
+
+For guidance on choosing between static and dynamic scaling modes, and how to configure them in `docker-compose.yaml`, see:
+
+- [Scaling Modes](./scaling_modes.md)
+
+### Environment Requirements
+
+Ensure your deployment environment meets these specifications before running the full NV-Ingest pipeline. Resource-constrained environments may experience performance degradation.