ONNX models generated by llm_export.py are missing some input and output nodes

## Describe the bug

I am using `Model-Optimizer/examples/torch_onnx/llm_export.py` script to convert a .safetensors LLM model to the ONNX format and quantize it. The model is supposed to be later converted then into the TRT format for being used by TensorRT. The so produced ONNX model has "input_ids", "logits", present_key_values*" but is missing "position_ids", "attention_mask" and "past_kv*" nodes.


### Steps/Code to reproduce bug



Install packages

```
python -m pip install nvidia-modelopt[all]
python -m pip install onnx==1.18.0
python -m pip install onnxruntime[gpu]==1.23.0
```
and all others on demand once requested during running llm_export.py. Set up paths:

```
export LD_LIBRARY_PATH=<path/to/cuda/libs>:<path/to/cudnn/lib>
export PATH=<path/to/cuda/bin>:$PATH
```

Clone the ModelOptimizer repo in order to use the example scripts

```
git clone https://github.com/NVIDIA/Model-Optimizer.git
```

Navigate to `torch_onnx` example

```
cd Model-Optimizer/examples/torch_onnx
```

and launch conversion of HF model to ONNX INT4 quantization:

```
python llm_export.py --hf_model_path=meta-llama/Llama-3.1-8B-Instruct --dtype=int4_awq --calib_size=512 --output_dir=models/Llama-3.1-8B-Instruct-ONNX-INT4
```

Result: the produced ONNX model is missing "position_ids", "attention_mask" and "past_kv*" nodes. 

### Expected behavior

A typical LLM model must have "input_ids", "attention_mask", "logits", past and present kv-cache nodes. In fact, some of them are missing. 

## System information




- OS (e.g., Ubuntu 22.04, CentOS 7, Windows 10): ?  Ubuntu 20.04
- CPU architecture (x86_64, aarch64): x86_64
- GPU memory size: enough
- Library versions (if applicable):
  - Python: 3.12
  - ModelOpt version or commit hash: >=0.39
  - CUDA: 12.3
  - PyTorch: 2.7.1+cu118
  - Transformers: 4.57.3
  - onnxruntime-gpu: 1.23.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ONNX models generated by llm_export.py are missing some input and output nodes #1147

Describe the bug

Steps/Code to reproduce bug

Expected behavior

System information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ONNX models generated by llm_export.py are missing some input and output nodes #1147

Description

Describe the bug

Steps/Code to reproduce bug

Expected behavior

System information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions