Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
91 changes: 91 additions & 0 deletions docs/docs/extraction/scaling_modes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
# NV-Ingest resource scaling modes

How NV-Ingest scales work across stages, and how to configure it with docker-compose.

- **Static scaling**: Each pipeline stage runs a fixed number of replicas based on heuristics (memory-aware). Good for consistent latency; higher steady-state memory usage.
- **Dynamic scaling**: Only the source stage is fixed; other stages scale up/down based on observed resource pressure. Better memory efficiency; may briefly pause to spin replicas back up after idle periods.

## When to choose which

- **Choose Static** when latency consistency and warm pipelines matter more than memory minimization.
- **Choose Dynamic** when memory headroom is constrained or workloads are bursty/idle for long periods.

## Configure (docker-compose)

Edit `services > nv-ingest-ms-runtime > environment` in `docker-compose.yaml`.

### Select mode

- **Dynamic (default)**
- `INGEST_DISABLE_DYNAMIC_SCALING=false`
- `INGEST_DYNAMIC_MEMORY_THRESHOLD=0.80` (fraction of memory; worker scaling reacts around this level)

- **Static**
- `INGEST_DISABLE_DYNAMIC_SCALING=true`
- Optionally set a static memory threshold:
- `INGEST_STATIC_MEMORY_THRESHOLD=0.85` (fraction of total memory reserved for static replicas)

Example (Static):

```yaml
services:
nv-ingest-ms-runtime:
environment:
- INGEST_DISABLE_DYNAMIC_SCALING=true
- INGEST_STATIC_MEMORY_THRESHOLD=0.85
```

Example (Dynamic):

```yaml
services:
nv-ingest-ms-runtime:
environment:
- INGEST_DISABLE_DYNAMIC_SCALING=false
- INGEST_DYNAMIC_MEMORY_THRESHOLD=0.80
```

### Pipeline config mapping

- `pipeline.disable_dynamic_scaling` ⇐ `INGEST_DISABLE_DYNAMIC_SCALING`
- `pipeline.dynamic_memory_threshold` ⇐ `INGEST_DYNAMIC_MEMORY_THRESHOLD`
- `pipeline.static_memory_threshold` ⇐ `INGEST_STATIC_MEMORY_THRESHOLD`

## Trade-offs recap

- **Dynamic**
- Pros: Better memory efficiency; stages scale down when idle; can force scale-down under spikes.
- Cons: After long idle, stages may scale to 0 replicas causing brief warm-up latency when work resumes.

- **Static**
- Pros: Stable, predictable latency; stages remain hot.
- Cons: Higher baseline memory usage over time.

## Sources of memory utilization

- **Workload size and concurrency**
- More in‑flight jobs create more objects (pages, images, tables, charts) and large artifacts (for example, embeddings).
- Example: 1 MB text file → paragraphs with 20% overlap → 4k‑dim embeddings base64‑encoded to JSON
- Assumptions: ~600 bytes per paragraph. 20% overlap ⇒ effective step ≈ 480 bytes. Chunks ≈ 1,000,000 / 480 ≈ 2,083.
- Per‑embedding size: 4,096 dims × 4 bytes (float32) = 16,384 bytes; base64 expansion × 4/3 ≈ 21,845 bytes (≈21.3 KB).
- Total embeddings payload: ≈ 2,083 × 21.3 KB ≈ 45 MB, excluding JSON keys/metadata.
- Takeaway: a 1 MB source can yield ≳40× memory just for embeddings, before adding extracted text, images, or other artifacts.
- Example: PDF rendering and extracted images (A4 @ 72 DPI)
- Rendering a page is a large in‑memory buffer; each extracted sub‑image adds more, and base64 inflates size.
- Page pixels ≈ 8.27×72 by 11.69×72 ≈ 595×842 ≈ 0.50 MP.
- RGB (3 bytes/pixel) ≈ 1.5 MB per page buffer; RGBA (4 bytes/pixel) ≈ 2.0 MB.
- Ten 1024×1024 RGB crops ≈ 3.0 MB each in memory → base64 (+33%) ≈ 4.0 MB each ⇒ ~40 MB just for crops (JSON not included).
- If you also base64 the full page image, expect another ~33% over the raw byte size (compression varies by format).
- **Library behavior**
- Components like PyArrow may retain memory longer than expected (delayed free).
- **Queues and payloads**
- Base64‑encoded, fragmented documents in Redis consume memory proportional to concurrent jobs, clients, and drain speed.

## Where to look in docker-compose

Open `docker-compose.yaml` and locate:

- `services > nv-ingest-ms-runtime > environment`:
- `INGEST_DISABLE_DYNAMIC_SCALING`
- `INGEST_DYNAMIC_MEMORY_THRESHOLD`
- `INGEST_STATIC_MEMORY_THRESHOLD`
35 changes: 35 additions & 0 deletions docs/docs/extraction/system_requirements.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# System Requirements

## Hardware Requirements

The NV-Ingest full ingestion pipeline is designed to consume significant CPU and memory resources to achieve maximal parallelism. Resource usage will scale up to the limits of your deployed system.

### Recommended Production Deployment Specifications

- **System Memory**: At least 256 GB RAM
- **CPU Cores**: At least 32 CPU cores
- **GPU**: NVIDIA GPU with at least 24 GB VRAM (e.g., A100, V100, or equivalent)

**Note:** Using less powerful systems or lower resource limits is still viable, but performance will suffer.

### Resource Consumption Notes

- The pipeline performs runtime allocation of parallel resources based on system configuration
- Memory usage can reach up to the full system capacity for large document processing
- CPU utilization scales with the number of concurrent processing tasks
- GPU is required for image processing NIMs, embeddings, and other GPU-accelerated tasks
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think there should be an additional section at the bottom, but xlinked here.

We should say why the CPU and mem requirements are high-

Something like:

For a representative set of 1000 PDFs, NV-Ingest renders 54,000 jpeg images, one per PDF page. We extract on average N sub-page jpegs (one each per table, chart, header, footer, section title, and text paragraphs). Downstream of each content type, we extract smaller bounding boxed jpegs for every chart element and every table cell (hundreds to thousands per table).

Can be followup, but needs to tell the user the tl;dr of why we use so many resources.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also link to whatever public materials we have on DC767 - @sosahi will have this


### Scaling Considerations

For production deployments processing large volumes of documents, consider:
- Higher memory configurations for processing large PDF files or image collections
- Additional CPU cores for improved parallel processing
- Multiple GPUs for distributed processing workloads

For guidance on choosing between static and dynamic scaling modes, and how to configure them in `docker-compose.yaml`, see:

- [Scaling Modes](./scaling_modes.md)

### Environment Requirements

Ensure your deployment environment meets these specifications before running the full NV-Ingest pipeline. Resource-constrained environments may experience performance degradation.
Loading