Skip to content

yester31/TensorRT_Examples

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

87 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Examples of TensorRT models using ONNX

All useful sample codes of TensorRT models using ONNX

0. Development Environment

  • RTX3060 (notebook)
  • WSL
  • Ubuntu 22.04.5 LTS
  • cuda 12.8

conda deactivate conda env remove -n trte -y

conda create -n trte python=3.11 --yes 
conda activate trte

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu129
pip install cuda-python==12.9.2
pip install tensorrt-cu12
pip install onnx
pip install opencv-python
pip install timm
pip install matplotlib

pip install -U "nvidia-modelopt[all]"

# Check installation 
python -c "import modelopt; print(modelopt.__version__)"
python -c "import modelopt.torch.quantization.extensions as ext; ext.precompile()"

1. TensorRT Model Conversion and Extension: A Practical Tutorial

  1. Generation TensorRT Model by using ONNX
    1.1 TensorRT CPP API
    1.2 TensorRT Python API
    1.3 Polygraphy

  2. Dynamic shapes for TensorRT
    2.1 Dynamic batch
    2.2 Dynamic input size

  3. Custom Plugin
    3.1 Adding a pre-processing layer by cuda

  4. Modifying an ONNX graph by ONNX GraphSurgeon
    4.1 Extracting a feature map of the last Conv for Grad-Cam
    4.2 Generating a TensorRT model with a custom plugin and ONNX

2. Advanced Optimization Techniques for TensorRT Inference

  1. Base model train & convert
    1.1 Train Base Model (resnet18)
    1.2 Base TensorRT (fp16)
  2. Quantization
    2.1 Explict Quantization (PTQ)
    2.2 Explict Quantization (QAT)
    2.3 Explict Quantization (ONNX PTQ)
    2.4 Implicit Quantization (TensorRT PTQ)
  3. Sparsity
    3.1 Sparsity (2:4 sparsity)
  4. Pruning
    4.1 Pruning
  5. NAS
    5.1 NAS(work in progress...)
  6. Multiple Optimizations
    6.1 (Pruning + Sparsity)
    6.2 (Pruning + Sparsity + Quantization(QAT))
Framework PyTorch TensorRT TensorRT TensorRT TensorRT TensorRT TensorRT TensorRT
Opti Technique - - trt ptq (Implicit) onnx ptq (Explict) tmo ptq (Explict) tmo qat (Explict) tmo sparsity tmo pruning (flops 80%)
Precision fp16 fp16 int8 int8 int8 int8 fp16 fp16
Top-1 Acc [%] 84.58 84.54 84.34 84.5 84.2 84.42 83.28 82.76
Top-5 Acc [%] 97.2 97.2 97.1 97 97.06 97.1 96.72 96.42
FPS [Frame/sec] 406.27 1463.45 1915.04 1897.46 1542.34 1572.81 1483.85 1573.2
Avg Latency [ms] 2.46 0.68 0.52 0.53 0.65 0.64 0.67 0.64
GPU Mem [MB] 286 138 124 124 124 138 138 130

3. Conversion of General Deep Learning Models to TensorRT

  1. Super Resolution
    1.1 Real-ESRGAN
  2. Object Detection
    2.1 yolo11
  3. Instance Segmentation
  4. Semantic Segmentation
    4.1 U-2-Net(Sky Segmentation) 4.2 BEN2(Background Erase Network) 4.3 MODNet 4.4 ormbg(Open Remove Background Model)
  5. Depth Estimation
    5.1 Depth Pro

4. reference