Skip to content

Commit eb5da13

Browse files
stevenlixRyan LaiPrabhatjeffblooPranav Sharma
authored
Cherry pick fixes to release branch rel-1.3.0 (#3936)
* Fix DirectML nuget creation in Nuget pipeline (#3929) * Added onnxruntime aarch64 wheel to pypi publishing pipeline (#3903) * Added onnxruntime aarch64 wheel to pypi publishing pipeline * Support nightly build flag * Add support for nightly build * Fix error handling in LearningModelSession.cpp (#3920) * Update DML Nuget version and DML EP Doc (#3945) Update DML Nuget version and DML EP Doc * Fix ordering of APIs. (#3951) Co-authored-by: Ryan Lai <[email protected]> Co-authored-by: Prabhat <[email protected]> Co-authored-by: Jeff Bloomfield <[email protected]> Co-authored-by: Pranav Sharma <[email protected]>
1 parent d80e15f commit eb5da13

File tree

8 files changed

+95
-27
lines changed

8 files changed

+95
-27
lines changed

cmake/external/dml.cmake

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ if (NOT onnxruntime_USE_CUSTOM_DIRECTML)
2020
set(NUGET_CONFIG ${PROJECT_SOURCE_DIR}/../NuGet.config)
2121
set(PACKAGES_CONFIG ${PROJECT_SOURCE_DIR}/../packages.config)
2222
get_filename_component(PACKAGES_DIR ${CMAKE_CURRENT_BINARY_DIR}/../packages ABSOLUTE)
23-
set(DML_PACKAGE_DIR ${PACKAGES_DIR}/DirectML.0.0.4)
23+
set(DML_PACKAGE_DIR ${PACKAGES_DIR}/DirectML.2.1.0)
2424

2525
# Restore nuget packages, which will pull down the DirectML redist package
2626
add_custom_command(

csharp/src/Microsoft.ML.OnnxRuntime/NativeMethods.cs

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,8 @@ public struct OrtApiBase
1313
public IntPtr GetVersionString;
1414
};
1515

16+
// NOTE: The order of the APIs in this struct should match exactly that in
17+
// OrtApi ort_api_1_to_3 (onnxruntime_c_api.cc)
1618
[StructLayout(LayoutKind.Sequential)]
1719
public struct OrtApi
1820
{
@@ -38,8 +40,8 @@ public struct OrtApi
3840
public IntPtr EnableCpuMemArena;
3941
public IntPtr DisableCpuMemArena;
4042
public IntPtr SetSessionLogId;
41-
public IntPtr SetSessionLogSeverityLevel;
4243
public IntPtr SetSessionLogVerbosityLevel;
44+
public IntPtr SetSessionLogSeverityLevel;
4345
public IntPtr SetSessionGraphOptimizationLevel;
4446
public IntPtr SetIntraOpNumThreads;
4547
public IntPtr SetInterOpNumThreads;
@@ -59,8 +61,8 @@ public struct OrtApi
5961
public IntPtr SessionGetOutputName;
6062
public IntPtr SessionGetOverridableInitializerName;
6163
public IntPtr CreateRunOptions;
62-
public IntPtr RunOptionsSetRunLogSeverityLevel;
6364
public IntPtr RunOptionsSetRunLogVerbosityLevel;
65+
public IntPtr RunOptionsSetRunLogSeverityLevel;
6466
public IntPtr RunOptionsSetRunTag;
6567
public IntPtr RunOptionsGetRunLogVerbosityLevel;
6668
public IntPtr RunOptionsGetRunLogSeverityLevel;

docs/execution_providers/DirectML-ExecutionProvider.md

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,16 @@
1-
# DirectML Execution Provider (Preview)
1+
# DirectML Execution Provider
22

33
DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning on Windows. DirectML provides GPU acceleration for common machine learning tasks across a broad range of supported hardware and drivers.
44

55
When used standalone, the DirectML API is a low-level DirectX 12 library and is suitable for high-performance, low-latency applications such as frameworks, games, and other real-time applications. The seamless interoperability of DirectML with Direct3D 12 as well as its low overhead and conformance across hardware makes DirectML ideal for accelerating machine learning when both high performance is desired, and the reliability and predictabiltiy of results across hardware is critical.
66

77
The *DirectML Execution Provider* is an optional component of ONNX Runtime that uses DirectML to accelerate inference of ONNX models. The DirectML execution provider is capable of greatly improving evaluation time of models using commodity GPU hardware, without sacrificing broad hardware support or requiring vendor-specific extensions to be installed.
88

9-
The DirectML Execution Provider is currently in preview.
9+
The DirectML Execution Provider currently uses DirectML version 2.1.0.
1010

1111
## Table of contents
1212

13-
- [DirectML Execution Provider (Preview)](#directml-execution-provider-preview)
13+
- [DirectML Execution Provider](#directml-execution-provider)
1414
- [Table of contents](#table-of-contents)
1515
- [Minimum requirements](#minimum-requirements)
1616
- [Building from source](#building-from-source)
@@ -48,7 +48,7 @@ To build onnxruntime with the DML EP included, supply the `--use_dml` parameter
4848

4949
The DirectML execution provider supports building for both x64 (default) and x86 architectures.
5050

51-
Note that building onnxruntime with the DirectML execution provider enabled causes the the DirectML redistributable package to be automatically downloaded as part of the build. This package contains a pre-release version of DirectML, and its use is governed by a license whose text may be found as part of the NuGet package.
51+
Note that building onnxruntime with the DirectML execution provider enabled causes the the DirectML redistributable package to be automatically downloaded as part of the build. Its use is governed by a license whose text may be found as part of the NuGet package.
5252

5353

5454

@@ -83,7 +83,7 @@ Creates a DirectML Execution Provider using the given DirectML device, and which
8383

8484
### ONNX opset support
8585

86-
The DirectML execution provider currently supports ONNX opset 9 ([ONNX v1.4](https://github.com/onnx/onnx/releases/tag/v1.4.0)). Evaluating models which require a higher opset version is not supported, and may produce unexpected results.
86+
The DirectML execution provider currently supports ONNX opset 11 ([ONNX v1.6](https://github.com/onnx/onnx/releases/tag/v1.6.0)). Evaluating models which require a higher opset version is not supported, and may produce unexpected results.
8787

8888
### Multi-threading and supported session options
8989

@@ -114,8 +114,9 @@ The DirectML execution provider works most efficiently when tensor shapes are kn
114114

115115
Normally when the shapes of model inputs are known during session creation, the shapes for the rest of the model are inferred by OnnxRuntime when a session is created. However if a model input contains a free dimension (such as for batch size), steps must be taken to retain the above performance benefits.
116116

117-
In this case, there are two options:
118-
- Edit the model to replace an input's free dimension (specified through ONNX using "dim_param") with a fixed size.
117+
In this case, there are three options:
118+
- Edit the model to replace an input's free dimension (specified through ONNX using "dim_param") with a fixed size (specified through ONNX using "dim_value").
119+
- Specify values of named dimensions within model inputs when creating the session using the OnnxRuntime *AddFreeDimensionOverrideByName* ABI.
119120
- Edit the model to ensure that an input's free dimension has a [denotation](https://github.com/onnx/onnx/blob/master/docs/DimensionDenotation.md) (such as "DATA_BATCH," or a custom denotation). Then when creating the session, specify the dimension size for each denotation. This can be done using the OnnxRuntime *AddFreeDimensionOverride* ABI.
120121

121122

packages.config

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
<?xml version="1.0" encoding="utf-8"?>
22
<packages>
3-
<package id="DirectML" version="0.0.4" targetFramework="native" />
3+
<package id="DirectML" version="2.1.0" targetFramework="native" />
44
<package id="GoogleTestAdapter" version="0.17.1" targetFramework="net46" />
55
</packages>

tools/ci_build/github/azure-pipelines/azure-pipelines-py-packaging.yml

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -343,3 +343,65 @@ jobs:
343343
ArtifactName: onnxruntime
344344

345345
- template: templates/component-governance-component-detection-steps.yml
346+
347+
- job: Linux_ARM_py_Wheels
348+
timeoutInMinutes: 60
349+
pool: 'Linux-CPU'
350+
strategy:
351+
matrix:
352+
Py37:
353+
python.include: '3.7m'
354+
cp.tag: 'cp37-cp37m'
355+
Py36:
356+
python.include: '3.6m'
357+
cp.tag: 'cp36-cp36m'
358+
Py35:
359+
python.include: '3.5m'
360+
cp.tag: 'cp35-cp35m'
361+
steps:
362+
- task: CmdLine@2
363+
inputs:
364+
script: |
365+
set -e -x
366+
sudo rm -rf *
367+
cd $(Build.SourcesDirectory)
368+
git submodule update --init --recursive
369+
cd -
370+
sudo apt-get install -y qemu-user-static
371+
sudo chmod a+x /usr/bin/azcopy
372+
373+
cat << EOF > tool-chain.cmake
374+
SET(CMAKE_SYSTEM_NAME Linux)
375+
SET(CMAKE_SYSTEM_VERSION 1)
376+
SET(CMAKE_C_COMPILER aarch64-linux-gnu-gcc)
377+
SET(CMAKE_C_FLAGS "-march=armv8-a -mtune=generic -Wno-unused-parameter -Wno-type-limits")
378+
SET(CMAKE_CXX_COMPILER aarch64-linux-gnu-g++)
379+
SET(CMAKE_CXX_FLAGS "-march=armv8-a -mtune=generic -Wno-unused-parameter -Wno-type-limits")
380+
SET(CMAKE_FIND_ROOT_PATH /mnt/toolchains/manylinux2014_aarch64)
381+
SET(CMAKE_FIND_ROOT_PATH_MODE_PROGRAM NEVER)
382+
SET(CMAKE_FIND_ROOT_PATH_MODE_LIBRARY ONLY)
383+
SET(CMAKE_FIND_ROOT_PATH_MODE_INCLUDE ONLY)
384+
SET(CMAKE_FIND_ROOT_PATH_MODE_PACKAGE ONLY)
385+
EOF
386+
export PATH=/mnt/toolchains/gcc-linaro-7.5.0-2019.12-x86_64_aarch64-linux-gnu/bin:$PATH
387+
azcopy cp https://onnxruntimetestdata.blob.core.windows.net/models/toolchains.tar.xz $(Build.BinariesDirectory)/toolchains.tar.xz
388+
sudo rm -rf /mnt/toolchains
389+
mkdir /mnt/toolchains
390+
tar -Jxf $(Build.BinariesDirectory)/toolchains.tar.xz -C /mnt/toolchains
391+
aria2c -q https://github.com/protocolbuffers/protobuf/releases/download/v3.11.1/protoc-3.11.1-linux-x86_64.zip
392+
unzip protoc-3.11.1-linux-x86_64.zip
393+
aria2c -q https://github.com/Kitware/CMake/releases/download/v3.17.1/cmake-3.17.1-Linux-x86_64.tar.gz
394+
tar --strip=1 -zxf cmake-3.17.1-Linux-x86_64.tar.gz
395+
sudo cp /mnt/toolchains/manylinux2014_aarch64/usr/include/stdlib.h /mnt/toolchains/gcc-linaro-7.5.0-2019.12-x86_64_aarch64-linux-gnu/aarch64-linux-gnu/libc/usr/include/
396+
bin/cmake -Donnxruntime_GCC_STATIC_CPP_RUNTIME=ON -DCMAKE_BUILD_TYPE=Release -Dprotobuf_WITH_ZLIB=OFF -DCMAKE_TOOLCHAIN_FILE=tool-chain.cmake -Donnxruntime_ENABLE_PYTHON=ON -DPYTHON_LIBRARY=dl -DPYTHON_EXECUTABLE=/mnt/toolchains/manylinux2014_aarch64/opt/python/'$(cp.tag)'/bin/python3 -Donnxruntime_BUILD_SHARED_LIB=OFF -Donnxruntime_RUN_ONNX_TESTS=OFF -Donnxruntime_DEV_MODE=ON -DONNX_CUSTOM_PROTOC_EXECUTABLE=$(Build.BinariesDirectory)/bin/protoc "-DPYTHON_INCLUDE_DIR=/mnt/toolchains/manylinux2014_aarch64/usr/include;/mnt/toolchains/manylinux2014_aarch64/opt/python/$(cp.tag)/include/python$(python.include)" -DNUMPY_INCLUDE_DIR=/mnt/toolchains $(Build.SourcesDirectory)/cmake
397+
make -j$(getconf _NPROCESSORS_ONLN)
398+
case $NIGHTLY_BUILD in
399+
1) docker run -v /usr/bin/qemu-aarch64-static:/usr/bin/qemu-aarch64-static -v $(Build.BinariesDirectory):/tmp/a -v $(Build.SourcesDirectory):/tmp/b -w /tmp/a --rm quay.io/pypa/manylinux2014_aarch64 /opt/python/'$(cp.tag)'/bin/python3 /tmp/b/setup.py bdist_wheel --nightly_build;;
400+
*) docker run -v /usr/bin/qemu-aarch64-static:/usr/bin/qemu-aarch64-static -v $(Build.BinariesDirectory):/tmp/a -v $(Build.SourcesDirectory):/tmp/b -w /tmp/a --rm quay.io/pypa/manylinux2014_aarch64 /opt/python/'$(cp.tag)'/bin/python3 /tmp/b/setup.py bdist_wheel;;
401+
esac
402+
workingDirectory: $(Build.BinariesDirectory)
403+
- task: PublishBuildArtifacts@1
404+
displayName: 'Publish Artifact: ONNXRuntime python wheel'
405+
inputs:
406+
PathtoPublish: '$(Build.BinariesDirectory)/dist'
407+
ArtifactName: onnxruntime

tools/ci_build/github/azure-pipelines/linux-arm-ci-pipeline.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -29,9 +29,9 @@ jobs:
2929
SET(CMAKE_SYSTEM_NAME Linux)
3030
SET(CMAKE_SYSTEM_VERSION 1)
3131
SET(CMAKE_C_COMPILER aarch64-linux-gnu-gcc)
32-
set(CMAKE_C_FLAGS "-march=armv8-a -mtune=generic -Wno-unused-parameter -Wno-type-limits")
32+
SET(CMAKE_C_FLAGS "-march=armv8-a -mtune=generic -Wno-unused-parameter -Wno-type-limits")
3333
SET(CMAKE_CXX_COMPILER aarch64-linux-gnu-g++)
34-
set(CMAKE_CXX_FLAGS "-march=armv8-a -mtune=generic -Wno-unused-parameter -Wno-type-limits")
34+
SET(CMAKE_CXX_FLAGS "-march=armv8-a -mtune=generic -Wno-unused-parameter -Wno-type-limits")
3535
SET(CMAKE_FIND_ROOT_PATH /mnt/toolchains/manylinux2014_aarch64)
3636
SET(CMAKE_FIND_ROOT_PATH_MODE_PROGRAM NEVER)
3737
SET(CMAKE_FIND_ROOT_PATH_MODE_LIBRARY ONLY)

tools/nuget/generate_nuspec_for_native_nuget.py

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -148,9 +148,12 @@ def generate_files(list, args):
148148
files_list.append('<file src=' + '"' + os.path.join(args.native_build_path, 'onnxruntime.pdb') + '" target="runtimes\\win-' + args.target_architecture + '\\native" />')
149149

150150
if includes_directml:
151-
files_list.append('<file src=' + '"' + os.path.join(args.native_build_path, 'DirectML.dll') + '" target="runtimes\\win-' + args.target_architecture + '\\native" />')
152-
files_list.append('<file src=' + '"' + os.path.join(args.native_build_path, 'DirectML.pdb') + '" target="runtimes\\win-' + args.target_architecture + '\\native" />')
153-
files_list.append('<file src=' + '"' + os.path.join(args.packages_path, 'DirectML.0.0.2\\LICENSE.txt') + '" target="DirectML_LICENSE.txt" />')
151+
files_list.append('<file src=' + '"' + os.path.join(args.native_build_path, 'DirectML.dll') +
152+
'" target="runtimes\\win-' + args.target_architecture + '\\native" />')
153+
files_list.append('<file src=' + '"' + os.path.join(args.native_build_path, 'DirectML.pdb') +
154+
'" target="runtimes\\win-' + args.target_architecture + '\\native" />')
155+
files_list.append('<file src=' + '"' + os.path.join(args.packages_path, 'DirectML.2.1.0\\LICENSE.txt') +
156+
'" target="DirectML_LICENSE.txt" />')
154157

155158
if includes_winml:
156159
# Process microsoft.ai.machinelearning import lib, dll, and pdb
@@ -251,4 +254,4 @@ def main():
251254
f.write('\n')
252255

253256
if __name__ == "__main__":
254-
sys.exit(main())
257+
sys.exit(main())

winml/lib/Api/LearningModelSession.cpp

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -103,16 +103,16 @@ void LearningModelSession::Initialize() {
103103
engine_factory_.copy_from(model_impl->GetEngineFactory());
104104

105105
com_ptr<_winml::IEngineBuilder> engine_builder;
106-
engine_factory_->CreateEngineBuilder(engine_builder.put());
106+
WINML_THROW_IF_FAILED(engine_factory_->CreateEngineBuilder(engine_builder.put()));
107107

108108
if (device_impl->IsCpuDevice() == false) {
109-
engine_builder->SetD3D12Resources(device_impl->GetD3DDevice(), device_impl->GetDeviceQueue());
110-
engine_builder->SetMetacommandsEnabled(device_impl->MetacommandsEnabled());
109+
WINML_THROW_IF_FAILED(engine_builder->SetD3D12Resources(device_impl->GetD3DDevice(), device_impl->GetDeviceQueue()));
110+
WINML_THROW_IF_FAILED(engine_builder->SetMetacommandsEnabled(device_impl->MetacommandsEnabled()));
111111
}
112112

113113
// Make onnxruntime apply the batch size override, if any
114114
if (session_options_ && session_options_.BatchSizeOverride() != 0) {
115-
engine_builder->SetBatchSizeOverride(session_options_.BatchSizeOverride());
115+
WINML_THROW_IF_FAILED(engine_builder->SetBatchSizeOverride(session_options_.BatchSizeOverride()));
116116
}
117117

118118
com_ptr<_winml::IEngine> engine;
@@ -123,7 +123,7 @@ void LearningModelSession::Initialize() {
123123
WINML_THROW_IF_FAILED(engine->RegisterCustomRegistry(operator_registry_.get()));
124124

125125
// Register transformers - this should probably not be exposed on IEngine, but an internal call as this configuration step is ort specific.
126-
engine->RegisterGraphTransformers();
126+
WINML_THROW_IF_FAILED(engine->RegisterGraphTransformers());
127127

128128
// Load the model into the session
129129
WINML_THROW_IF_FAILED(engine->LoadModel(model.get()));
@@ -229,17 +229,17 @@ uint64_t LearningModelSession::Run(winrt::com_ptr<winmlp::LearningModelBinding>
229229
std::back_inserter(outputs_raw),
230230
[&](auto& input) { return input.get(); });
231231

232-
engine_->Run(input_names_raw.data(),
232+
WINML_THROW_IF_FAILED(engine_->Run(input_names_raw.data(),
233233
inputs_raw.data(),
234234
input_names_raw.size(),
235235
output_names_raw.data(),
236236
outputs_raw.data(),
237-
output_names_raw.size());
237+
output_names_raw.size()));
238238

239239
if (!device->IsCpuDevice()) {
240240
// Flush the D3D12 work from the DML execution provider and queue a fence before we release the lock.
241241
// This allows us to wait without holding onto the lock in GetResults.
242-
engine_->FlushContext();
242+
WINML_THROW_IF_FAILED(engine_->FlushContext());
243243
return device->GetD3DDeviceCache()->QueueFenceToD3D12();
244244
}
245245

@@ -268,10 +268,10 @@ LearningModelSession::GetResults(
268268
if (is_gpu_evaluation) {
269269
// For DML we aren't using the Sync function because we want to make fencing the
270270
// completed frame thread safe while not holding the lock while waiting for the gpu.
271-
engine_->ReleaseCompletedReferences();
271+
WINML_THROW_IF_FAILED(engine_->ReleaseCompletedReferences());
272272
} else {
273273
// For CPU call the standard Sync function
274-
engine_->Sync();
274+
WINML_THROW_IF_FAILED(engine_->Sync());
275275
}
276276

277277
// This isn't the best we are holding the lock while we wait for detensorize on the GPU.

0 commit comments

Comments
 (0)