💖 If you find ort
useful, please consider sponsoring us on Open Collective 💖
🤔 Need help upgrading? Ask questions in GitHub Discussions or in the pyke.io Discord server!
🔗 Tensor Array Views
You can now create a TensorRef
directly from an ArrayView
. Previously, tensors could only be created via Tensor::from_array
(which, in many cases, performed a copy if borrowed data was provided). The new TensorRef::from_array_view
(and the complementary TensorRefMut::from_array_view_mut
) method(s) allows for the zero-copy creation of tensors directly from an ArrayView
.
Tensor::from_array
now only accepts owned data, so you should either refactor your code to use TensorRef
s or pass ownership of the array to the Tensor
.
⚠️ ndarray
s must be in standard/contiguous memory layout to be converted to aTensorRef(Mut)
; see.as_standard_layout()
.
↔️ Copy Tensors
rc.10
now allows you to manually copy tensors between devices using Tensor::to
!
// Create our tensor in CUDA memory
let cuda_allocator = Allocator::new(
&session,
MemoryInfo::new(AllocationDevice::CUDA, 0, AllocatorType::Device, MemoryType::Default)?
)?;
let cuda_tensor = Tensor::<f32>::new(&cuda_allocator, [1_usize, 3, 224, 224])?;
// Copy it back to CPU
let cpu_tensor = cuda_tensor.to(AllocationDevice::CPU, 0)?;
There's also Tensor::to_async
, which replicates the functionality of PyTorch's non_blocking=True
. Additionally, Tensor
s now implement Clone
.
⚙️ Alternative Backends
ort
is no longer just a wrapper for ONNX Runtime; it's a one-stop shop for inferencing ONNX models in Rust thanks to the addition of the alternative backend API.
Alternative backends wrap other inference engines behind ONNX Runtime's API, which can simply be dropped in and used in ort
- all it takes is one line of code:
fn main() {
ort::set_api(ort_tract::api()); // <- magic!
let session = Session::builder()?
...
}
2 alternative backends are shipping alongside rc.10
- ort-tract
, powered by tract
, and ort-candle
, powered by candle
, with more to come in the future.
Outside of the Rust ecosystem, these alternative backends can also be compiled as standalone libraries that can be directly dropped in to applications as a replacement for libonnxruntime
. 🦀🦠
✏️ Model Editor
Models can be created entirely programmatically, or edited from an existing ONNX model via the new Model Editor API.
See src/editor/tests.rs
for an example of how an ONNX model can be created programmatically. You can combine the Model Editor API with SessionBuilder::with_optimized_model_path
to export the model outside Rust.
⚛️ Compiler
Many execution providers internally convert ONNX graphs to a framework-specific graph representation, like CoreML networks/TensorRT engines. This process can take a long time, especially for larger and more complex models. Since these generated artifacts aren't persisted between runs, they have to be created every time a session is loaded.
The new Compiler API allows you to compile an optimized, EP-ready graph ahead-of-time, so subsequent loads are lighting fast! ⚡
ModelCompiler::new(
Session::builder()?
.with_execution_providers([
TensorRTExecutionProvider::default().build()
])?
)?
.with_model_from_file("model.onnx")?
.compile_to_file("compiled_trt_model.onnx")?;
🪶 #![no_std]
🚨 BREAKING: If you previously used
ort
withdefault-features = false
...That will now disable
ort
'sstd
feature, which means you don't get to use APIs that interact with the operating system, likeSessionBuilder::commit_from_file
- APIs you probably need!To minimize breakage, manually enable the
std
feature:[dependencies] ort = { version = "=2.0.0-rc.10", default-features = false, features = [ "std", ... ] }
ort
no longer depends on std
(but does still depend on alloc
) - default-features = false
will enable #![no_std]
for ort
.
⚡ Execution Providers
🚨 BREAKING: Boolean options for ArmNN, CANN, CoreML, CPU, CUDA, MIGraphX, NNAPI, OpenVINO, & ROCm...
If you previously used an option setter on one of these EPs that took no parameters (i.e. a boolean option that was
false
by default), note that these functions now do take a boolean parameter to align with Rust idiom.Migrating is as simple as passing
true
to these functions. Affected functions include:
ArmNNExecutionProvider::with_arena_allocator
CANNExecutionProvider::with_dump_graphs
CPUExecutionProvider::with_arena_allocator
CUDAExecutionProvider::with_cuda_graph
CUDAExecutionProvider::with_skip_layer_norm_strict_mode
CUDAExecutionProvider::with_prefer_nhwc
MIGraphXExecutionProvider::with_fp16
MIGraphXExecutionProvider::with_int8
NNAPIExecutionProvider::with_fp16
NNAPIExecutionProvider::with_nchw
NNAPIExecutionProvider::with_disable_cpu
NNAPIExecutionProvider::with_cpu_only
OpenVINOExecutionProvider::with_opencl_throttling
OpenVINOExecutionProvider::with_dynamic_shapes
OpenVINOExecutionProvider::with_npu_fast_compile
ROCmExecutionProvider::with_exhaustive_conv_search
🚨 BREAKING: Renamed enum options for CANN, CUDA, QNN...
The following EP option enums have been renamed to reduce verbosity:
CANNExecutionProviderPrecisionMode
->CANNPrecisionMode
CANNExecutionProviderImplementationMode
->CANNImplementationMode
CUDAExecutionProviderAttentionBackend
->CUDAAttentionBackend
CUDAExecutionProviderCuDNNConvAlgoSearch
->CuDNNConvAlgorithmSearch
QNNExecutionProviderPerformanceMode
->QNNPerformanceMode
QNNExecutionProviderProfilingLevel
->QNNProfilingLevel
QNNExecutionProviderContextPriority
->QNNContextPriority
🚨 BREAKING: Updated CoreML options...
CoreMLExecutionProvider
has been updated to use a new registration API, unlocking more options. To migrate old options:
.with_cpu_only()
->.with_compute_units(CoreMLComputeUnits::CPUOnly)
.with_ane_only()
->.with_compute_units(CoreMLComputeUnits::CPUAndNeuralEngine)
.with_subgraphs()
->.with_subgraphs(true)
rc.10
adds support for 3 execution providers:
- Azure allows you to call Azure AI models like GPT-4 directly from
ort
. - WebGPU is powered by Dawn, an implementation of the WebGPU standard, allowing accelerated inference with almost any D3D12/Metal/Vulkan/OpenGL-supported GPU. Binaries with the WebGPU EP are available on Windows & Linux, so you can start testing it straight away!
- NV TensorRT RTX is a new execution provider purpose-built for NVIDIA RTX GPUs running with ONNX Runtime on Windows. It's powered by TensorRT for RTX, a specially-optimized inference library built upon TensorRT releasing in June.
All binaries are now statically linked! This means the cuda
and tensorrt
features no longer use onnxruntime.dll
/libonnxruntime.so
. The EPs themselves do still require separate DLLs - like libonnxruntime_providers_cuda
- but this change should make it significantly easier to set up and use ort
with CUDA/TRT.
🧩 Custom Operator Improvements
🚨 BREAKING: Migrating your custom operators...
- All methods under
Operator
now take&self
.- The operator's kernel is no longer an associated type -
create_kernel
is instead expected to return aBox<dyn Kernel>
(which can now be created directly from a function!)impl Operator for MyCustomOp { - type Kernel = MyCustomOpKernel; - fn name() -> &'static str { + fn name(&self) -> &str { "MyCustomOp" } - fn inputs() -> Vec<OperatorInput> { + fn inputs(&self) -> Vec<OperatorInput> { vec![OperatorInput::required(TensorElementType::Float32)] } - fn outputs() -> Vec<OperatorOutput> { + fn outputs(&self) -> Vec<OperatorOutput> { vec![OperatorOutput::required(TensorElementType::Float32)] } - fn create_kernel(_: &KernelAttributes) -> ort::Result<Self::Kernel> { - Ok(MyCustomOpKernel) - } + fn create_kernel(&self, _: &KernelAttributes) -> ort::Result<Box<dyn Kernel>> { + Ok(Box::new(|ctx: &KernelContext| { + ... + })) + } }To add an operator to an
OperatorDomain
, you now pass the operator by value instead of as a type parameter:let mut domain = OperatorDomain::new("io.pyke")?; -domain = domain.add::<MyCustomOp>()?; +domain = domain.add(MyCustomOp)?;
Custom operators have been internally revamped to reduce code size & compilation time, and allow operators to be Sized
.
🔷 Miscellaneous changes
- Updated to ONNX Runtime v1.22.0.
- The minimum supported Rust version (MSRV) is now 1.81.0.
- The
tracing
dependency is now optional (but enabled by default).- To keep using
tracing
withdefault-features = false
, enable thetracing
feature. - When disabled, ONNX Runtime will log its messages directly to stdout. The log level defaults to
WARN
but can be controlled at runtime via theORT_LOG
environment variable by setting it to one ofverbose
,info
,warning
,error
, orfatal
.
- To keep using
- The domain serving prebuilt binaries has moved from
parcel.pyke.io
tocdn.pyke.io
, so make sure to update firewall exclusions. - The
build.rs
hack for Apple platforms is no longer required. (9b31680
) - The
ureq
dependency (used bydownload-binaries
/fetch-models
) has been ugpraded to v3.0.ort
with thefetch-models
feature will userustls
as the TLS provider.ort-sys
with thedownload-binaries
feature will usenative-tls
since that pulls less dependencies (it previously usedrustls
). No prerequisites are required when building on Windows & macOS, but other platforms now require OpenSSL to be installed.
- All ONNX Runtime tensor types are now supported - including
Complex64
&Complex128
, 4-bit integers, and 8 bit floats!- Tensors of these types cannot be created from an array or extracted since they don't have de facto Rust equivalents, but you can use
DynTensor::new
to allocate a tensor andDynTensor::data_ptr
to access its data.
- Tensors of these types cannot be created from an array or extracted since they don't have de facto Rust equivalents, but you can use
- Reduce allocations (
e136869
)Session::run
can now be zero-alloc (on the Rust side)!
- Prebuilt binaries are now powered by KleidiAI on ARM64 - this should make them a fair bit faster!
⚠️ Breaking
- 🚨
Session::run
now takes&mut self
.- 💡 Tip when using mutexes: You can use
SessionOutputs::remove
to get an owned session output.
- 💡 Tip when using mutexes: You can use
- 🚨
ort::inputs!
no longer outputs aResult
, so remove the trailing?
from any invocations of the macro. - 🚨
extract_tensor
to extract a tensor to anndarray
has been renamed toextract_array
, withextract_raw_tensor
now taking the place ofextract_tensor
.DynValue::try_extract_tensor(_mut)
->DynValue::try_extract_array(_mut)
Tensor::extract_tensor(_mut)
->Tensor::extract_array(_mut)
DynValue::try_extract_raw_tensor(_mut)
->DynValue::try_extract_tensor(_mut)
Tensor::extract_raw_tensor(_mut)
->Tensor::extract_tensor(_mut)
Session::run_async
now always takes&RunOptions
;Session::run_async_with_options
has been removed.- Most instances of "dimensions" (i.e. in
ValueType::tensor_dimensions
) has been replaced with "shape" (soValueType::tensor_shape
) for consistency. - Tensor shapes now use a custom struct,
ort::tensor::Shape
, instead of aVec<i64>
directly.- Similarly,
ValueType::Tensor.dimension_symbols
is its own struct,SymbolicDimensions
. - Both can be converted from their prior forms via
::from()
/.into()
.
- Similarly,
SessionBuilder::with_execution_providers
now takesAsRef<[EP]>
instead of any iterable type.SessionBuilder::with_external_initializer_file_in_memory
requires aPath
for thepath
parameter instead of a regular&str
.
🪲 Fixes
- Zero out tensors created on the CPU via
Tensor::new
. (7a95f98
)- In some cases, the memory allocated by ONNX Runtime for new tensors was not initally zeroed. Now, any tensors created in CPU-accessible memory via
Tensor::new
will be manually zeroed on the Rust side.
- In some cases, the memory allocated by ONNX Runtime for new tensors was not initally zeroed. Now, any tensors created in CPU-accessible memory via
IoBinding::synchronize_*
now takes&self
sosynchronize_outputs
can actually be used as intended (e8d873a
)- Fix
XNNPACKExecutionProvider::is_available
always returningfalse
(5ad997c
) - Fix a memory lifetime issue with
AllocationDevice
&MemoryInfo
(3ca14c2
) - Fix OpenVINO EP registration failures by ensuring an environment is available (
3e7e8fe
)- and use the new registration API for OpenVINO (
5661450
)
- and use the new registration API for OpenVINO (
ort-sys
crate now specifieslinks
, hopefully preventing linking conflicts (d2dc7c8
)- Correct the internal device name for the DirectML
AllocationDevice
(46c3376
) ort-sys
no longer tries to download binaries when building with--offline
(d7d4493
)- Dylib symlinks are now properly renewed when the library updates (
4b6b163
) - ONNX Runtime log levels are now mapped directly to their corresponding
tracing
level instead of being knocked down a level (d8bcfd7
) - Fixed the name of the flag set by
TensorRTExecutionProvider::with_context_memory_sharing
(#327)- ...and
with_build_heuristics
&with_sparisty
(b6ddfd8
)
- ...and
- Fixed concurrent downloads from
commit_from_url
orort-sys
(eb51646
/#323) - Fix linking XNNPACK on ARM64. (#384)