Inquiry About INT Weight and Bias Quantization Methods

Hello,

I would like to ask what method you use for full-integer quantization before hardware deployment (during high-level inference). Do you use PTQ or QAT? Additionally, how do you ensure that this integer quantization does not affect inference accuracy? Could you provide your method?

Thank you!