What is new in 0.2.0
Bug/Issue Fixes
- Fixed incorrect use of double constants in some operators
- Fixed crash when loading models that were saved on OCL devices
- Fixed default parameter of torch.ocl.synchronize
- Fixed failure of printing on Intel devices with missing fp64 support
New nets Validated
Visual transformers vit_transformets and vit_x_NN ets validated
New operators implemented:
resize_,arange,mm,bmm,amin,amax,addmm,_native_multi_head_attentionandtransform_bias_rescale_qkv,round,maximum,minimum,prod,atan,dropout_native- lt,le,gt,ge,eq,ne for tensors
- bitwise
^,|,&,~ upsample_2d: bilinear, nearest and nearest exact, forward and backward
Fixed operators
- Fixed softmax and log softmax support of dim that is not last dim
- Fixed view operator and set_ storage
- cat now supports mixed types
- Fix handling of empty tensors with non empty storage
- Very limited half tensor handling
- Fixed tensor
>,<==,!=scalar ops
New features:
- Added support of profiling via
torch.ocl.profileAPI - Improved benchmark scripts
Performance improvements
- Intel Arc, UHD - enabled winograd convolution, support of OpenCL 3.0 floating point add atomics, enabled k-reduction for GEMM operators
- NVidia - added use of native atomic float add (via PTX assembly)
- GELU major improvements due to faulty use of double instead of float