20 Apr 18:25

vraspar

7a71bc5

ONNX Runtime v1.25.0 Latest

Latest

📢 Announcements & Breaking Changes

Build & Platform

C++20 is now required to build ONNX Runtime from source. Minimum toolchains: MSVC 19.29+, GCC 10+, Clang 10+. Users of prebuilt packages are unaffected. (#27178)
CUDA minimum version raised to 12.0 — CUDA 11.x is no longer supported. Users pinned to CUDA 11.x should stay on ORT 1.24.x or upgrade their CUDA toolkit/driver. (#27570)
ONNX upgraded to 1.21.0 (#27601)
sympy is now an optional dependency for Python builds. (#27200)

Execution Provider Changes

ArmNN EP has been removed. Users should remove any --use_armnn build flags and migrate to the MLAS/KleidiAI-backed CPU EP or QNN EP for Qualcomm hardware. (#27447)

API Version

ORT_API_VERSION updated to 25. (#27280)

🔒 Security Fixes

Fixed potential integer truncation leading to heap out-of-bounds read/write (#27544)
Addressed Pad Reflect vulnerability (#27652)
Security fix for transpose optimizer (#27555)
Upgraded minimatch 3.1.2 → 3.1.4 for CVE-2026-27904 (#27667)
Hardened shell command handling for constant strings (#27840)
Added validation of onnx::TensorProto data size before allocation (#27547)
Cleaned up external data path validation (#27539)
Fixed misaligned address reads for tensor attributes from raw data buffers (#27312)
Fixed CPU Attention overflow issue (#27822)
Fixed CPU LRN integer overflow issues (#27886)
Additional input validation hardening:
- Tile kernel dim overflow (#27566)
- Out-of-bounds read in cross entropy (#27568)
- TreeEnsembleClassifier attributes (#27571)
- AffineGrid (#27572)
- EmbedLayerNorm position_ids (#27573)
- RotaryEmbedding position_ids (#27597)
- RoiAlign batch_indices (#27603)
- MaxUnpool indices (#27432)
- QMoECPU swiglu OOB (#27748)
- SVMClassifier initializer (#27699)
- Col2Im SafeInt (#27625)

✨ New Features

🔌 Execution Provider Plugin API & CUDA Plugin EP

ORT 1.25.0 introduces the CUDA Plugin EP — the first core implementation that enables third-party CUDA-backed EPs to be delivered as dynamically loaded plugins without rebuilding ORT.

CUDA Plugin EP: Core implementation (#27816)
CUDA Plugin EP: BFC-style arena and CUDA mempool allocators for stream-aware memory management (#27931)
Plugin EP Sync API for synchronous execution (#27538)
Plugin EP event profiling APIs (#27649)
Plugin EP APIs to retrieve ONNX operator schemas (#27713)
Annotation-based graph partitioning with resource accounting (#27595, #27972)
EP API adapter improvements: header-only adapter, OpKernelInfo::GetConfigOptions, LoggingManager::HasDefaultLogger() (#26879, #26919, #27540, #27541, #27587)
WebGPU EP made compatible with EP API (#26907)

🔧 Core APIs

Per-session thread pool work callbacks API (#27253)
enable_profiling in RunOptions (#26846)
KernelInfo string-array attribute APIs for C and C++ (#27599)
OrtModel input support for Compile API (#27332)
Session config to create weightless EPContext models during compilation (#27197)
Compiled model compatibility APIs in example plugin EP (#27088)
Model Package support (preview): Initial infrastructure for automatically selecting compiled EPContext model variants from a packaged collection based on EP, device, and hardware constraints. The directory structure is not yet finalized. (#27786)

📊 New ONNX Ops & Opset Coverage

Attention opset 23 on CUDA with GQA, boolean masks, softcap, and softmax precision (#26466, #27030, #27082, #27428, #27714)
Attention opset 24 on CUDA, disjoint from contrib op (#27542); nonpad KV seqlen on CPU (#27384)
TensorScatter-24 for CPU and CUDA (#27389, #27446)
DeformConv for CPU/CUDA (#27393)
LpNormalization-22 (#27164)
CUDA opset gap fills:
- Control flow & misc: Flatten, Identity, If, Loop, Scan, ConstantOfShape, Size (opset 21/23) (#27728)
- Pooling: GlobalAveragePool/GlobalMaxPool (→22) (#27733)
- Shape ops: Shape (→25), Squeeze/Unsqueeze (→25) (#27734, #27739)
- TopK (→24, BF16) (#27735), GRU (→22) (#27738)
- Pad (→25, wrap mode) (#27774), Resize v19 (#27415), RoiAlign v16/v22 (#27646)

🖥️ Execution Provider Updates

NVIDIA CUDA EP

GQA with XQA and quantized KV cache, including FP8 (E4M3) KV cache support (#27246, #27321)
CUDA graph capture compatibility for LLM ops and pre-compiled paths (#27484, #27477)
Volumetric (3-D) GridSample support (#27201)
Optimized 3D nearest resize kernel for 5D tensors (#27578)
Optional router_weights input to QMoE (#27687)

NVIDIA TensorRT RTX EP

D3D12 external resource import support (#26948)

Qualcomm QNN EP

Disabled file mapping for embedded cache (#27627)
Fixed use-after-free of logger object (#27804)
Fixed wheel build issues on WSL and Linux SDK version propagation (#27730, #27800)

Other EPs

VitisAI EP: Added PE version info to provider DLL (#27626)
DML EP: Fixed overflow in DmlGraphFusionHelper::ProcessInputData (#27815), fixed new-delete mismatch in QuantizeLinear (#27823)

🌐 Web & JavaScript

WebGPU EP — Performance

Gemm/MatMul opt...

Assets 12

17 Mar 23:08

tianleiwu

v1.24.4

2d92497

ONNX Runtime v1.24.4

This is a patch release for ONNX Runtime 1.24, containing bug fixes and execution provider updates.

Bug Fixes

Core: Added PCI bus fallback for Linux GPU device discovery in containerized environments (e.g., AKS/Kubernetes) where nvidia-drm is not loaded but GPU PCI devices are still exposed via sysfs. (#27591)
Plugin EP: Fixed null pointer dereference when iterating output spans in GetOutputIndex. (#27644)
Plugin EP: Fixed bug that incorrectly assigned duplicate MetaDef IDs to fused nodes in different GraphViews (e.g., then/else branches of an If node), causing session creation to fail with a conflicting kernel error. (#27666)

Execution Provider Updates

QNN EP: Enabled offline x64 compilation with memhandle IO type by deferring rpcmem library loading to inference time. (#27479)
QNN EP: Reverted QNN SDK logging verbosity changes that caused segmentation faults on backend destruction. (#27650)

Build and Infrastructure

Python: Updated python_requires from >=3.10 to >=3.11 to reflect dropped Python 3.10 support. (#27354)
Build: Replaced __builtin_ia32_tpause with the compiler-portable _tpause intrinsic to fix cross-compiler portability issues between GCC and LLVM. (#27607)

Full Changelog: v1.24.3...v1.24.4

Contributors

@derdeljan-msft, @adrianlizarraga, @apwojcik, @baijumeswani, @edgchen1, @mocknen, @tianleiwu, @XXXXRT666

Assets 12

05 Mar 19:00

tianleiwu

v1.24.3

3a728b7

ONNX Runtime v1.24.3

This is a patch release for ONNX Runtime 1.24, containing bug fixes, security improvements, performance enhancements, and execution provider updates.

Security Fixes

Core: Fixed GatherCopyData integer truncation leading to heap out-of-bounds read/write. (#27444)
Core: Fixed RoiAlign heap out-of-bounds read via unchecked batch_indices. (#27543)
Core: Prevent heap OOB from maliciously crafted Lora Adapters. (#27518)
Core: Fixed out-of-bounds access for Resize operation. (#27419)

Bug Fixes

Core: Fixed GatherND division by zero when batch dimensions mismatch. (#27090)
Core: Fixed validation for external data paths for models loaded from bytes. (#27430)
Core: Fixed SkipLayerNorm fusion incorrectly applied when gamma/beta are not 1D. (#27459)
Core: Fixed double-free in TRT EP custom op domain Release functions. (#27471)
Core: Fixed QMoE CPU Operator. (#27360)
Core: Fixed MatmulNBits prepacking scales. (#27412)
Python: Fixed refcount bug in map input conversion that caused shutdown segfault. (#27413)
NuGet: Fixed DllImportResolver. (#27397)
NuGet: Added OrtEnv.DisableDllImportResolver to prevent fatal error on resolver conflict. (#27535)

Performance Improvements

Core: QMoE CPU performance update (up to 4x on 4-bit). (#27364)
Core: Fixed O(n²) model load time for TreeEnsemble with categorical feature chains. (#27391)

Execution Provider Updates

NvTensorRtRtx EP:
- Avoid repetitive creation of fp4/fp8 native-custom-op domains. (#27192)
- Added missing override specifiers to suppress warnings. (#27288)
- DQ→MatMulNBits fusion transformer. (#27466)
WebGPU:
- Used embedded WASM module in Blob URL workers when wasmBinary is provided. (#27318)
- Fixed usage of wasmBinary together with a blob URL for .mjs. (#27411)
- Removed the unhelpful "Unknown CPU vendor" warning. (#27399)
- Allows new memory info name for WebGPU. (#27475)
MLAS:
- Added DynamicQGemm function pointers and ukernel interface. (#27403)
- Fixed error where bytes is not assigned for dynamic qgemm pack b size. (#27421)
VitisAI EP: Removed s_kernel_registry_vitisaiep.reset() in deinitialize_vitisai_ep(). (#27295)
Plugin EPs: Added "library_path" metadata entry to OrtEpDevice instances for plugin and provider bridge EPs. (#27522)

Build and Infrastructure

Pipelines:
- Build Windows ARM64X binaries as part of packaging pipeline. (#27316)
- Moved JAR testing pipelines to canonical pipeline template. (#27480)
Python: Enabled Python 3.14 CI and upgraded dependencies. (#27401)
Build: Suppressed spurious Array Out of Bounds warnings produced by GCC 14.2 compiler on Linux builds. (#27454)
Build: Fixed -Warray-bounds build error in MLAS on clang 17+. (#27499)
Telemetry: Added/Updated telemetry events. (#27356)
Config: Increased kMaxValueLength to 8192. (#27521)

Full Changelog: v1.24.2...v1.24.3

Contributors

@tianleiwu, @fs-eire, @adrianlizarraga, @yuslepukhin, @0-don, @anujj, @chaya2350, @chilo-ms, @dabhattimsft, @edgchen1, @eserscor, @hariharans29, @JonathanC-ARM, @lukas-folle-snkeos, @patryk-kaiser-ARM, @praneshgo, @skottmckay, @theHamsta, @vektah, @vishalpandya1990, @vthaniel, @xieofxie, @zz002

Assets 12

19 Feb 21:28

tianleiwu

v1.24.2

058787c

ONNX Runtime v1.24.2

This is a patch release for ONNX Runtime 1.24, containing several bug fixes, security improvements, and execution provider updates.

Bug Fixes

NuGet: Fixed native library loading issues in the ONNX Runtime NuGet package on Linux and macOS. (#27266)
macOS: Fixed Java support and Jar testing on macOS ARM64. (#27271)
Core: Enable Robust Symlink Support for External Data for Huggingface Hub Cache. (#27374)
Core: Added boundary checks for SparseTensorProtoToDenseTensorProto to improve robustness. (#27323)
Security: Fixed an out-of-bounds read vulnerability in ArrayFeatureExtractor. (#27275)

Execution Provider Updates

MLAS: Fixed flakiness and accuracy issues in Lut GEMM (MatMulNBitsLutGemm). (#27216)
QNN: Enabled 64-bit UDMA mode for HTP target v81 or above. (#26677)
WebGPU:
- Used LazyRelease for prepack allocator. (#27077)
- Fixed ConvTranspose bias validation in both TypeScript and C++ implementations. (#27213)
OpenVINO (OVEP): Patch to reduce resident memory by reusing weight files across shared contexts. (#27238)
DNNL: Fixed DNNL build error by including missing files. (#27334)

Build and Infrastructure

CUDA:
- Added support for CUDA architecture family codes (suffix 'f') introduced in CUDA 12.9. (#27278)
- Fixed build errors and warnings for various CUDA versions (12.8, 13.0, 13.1.1). (#27276)
- Applied patches for Abseil CUDA warnings. (#27096, #27126)
Pipelines:
- Fixed Python packaging pipeline for Windows ARM64 and release. (#27339, #27350, #27299)
- Fixed DirectML NuGet pipeline to correctly bundle x64 and ARM64 binaries for release. (#27349)
- Updated Microsoft.ML.OnnxRuntime.Foundry package for Windows ARM64 support and NuGet signing. (#27294)
Testing: Updated BaseTester to support plugin EPs with both compiled nodes and registered kernels. (#27176)
Telemetry: Added service name and framework name to telemetry events for better usage understanding on Windows. (#27252, #27256)

Full Changelog: v1.24.1...v1.24.2

Contributors

@tianleiwu, @hariharans29, @edgchen1, @xiaofeihan1, @adrianlizarraga, @angelser, @angelserMS, @ankitm3k, @baijumeswani, @bmehta001, @ericcraw, @eserscor, @fs-eire, @guschmue, @mc-nv, @qjia7, @qti-monumeen, @titaiwangms, @yuslepukhin

Assets 11

06 Feb 00:00

tianleiwu

v1.24.1

470ae16

ONNX Runtime v1.24.1

📢 Announcements & Breaking Changes

Platform Support Changes

Python 3.10 wheels are no longer published — Please upgrade to Python 3.11+
Python 3.14 support added
Free-threaded Python (PEP 703) — Added support for Python 3.13t and 3.14t in Linux (#26786)
x86_64 binaries for macOS/iOS are no longer provided and minimum macOS is raised to 14.0

API Version

ORT_API_VERSION updated to 24 (#26418)

✨ New Features

🤖 Execution Provider (EP) Plugin API

A major infrastructure enhancement enabling plugin-based EPs with dynamic loading:

Initial kernel-based EP support (#26206)
Weight pre-packing support for plugin EPs (#26754)
EP Context model support (#25124)
Control flow kernel APIs (#26927)
OrtKernelInfo APIs for kernel-based plugin EPs (#26803)

🔧 Core APIs

OrtApi::CreateEnvWithOptions() and OrtEpApi::GetEnvConfigEntries() (#26971)
EP Device Compatibility APIs (#26922)
External Resource Importer API for D3D12 shared resources (#26828)
Session config access from KernelInfo (#26589)

📊 Dependencies & Integration

ONNX upgraded to 1.20.1 (#26579)
Protobuf updated from 3.20.3 → 4.25.8 (#26910)
CUDA Graph enabled by default (#26929)

🖥️ Execution Provider Updates

NVIDIA

CUDA EP: Flash Attention updates, GQA kernel fusion, BF16 support for MoE/qMoE/MatMulNBits, CUDA 13.0 support
TensorRT EP: Upgraded to TensorRT 10.14, automatic plugin loading, NVFP4 custom ops
TensorRT RTX EP: RTX runtime caching, CUDA graph support, BFloat16, memory-mapped engines

Qualcomm QNN EP

QNN SDK upgraded to 2.42.0 with new ops (RMSNorm, ScatterElements, GatherND, STFT, RandomUniformLike)
Gelu pattern fusion, LPBQ quantization support, ARM64 wheel builds, v81 device support

Intel & AMD

OpenVINO EP: Upgraded to 2025.4.1
VitisAI EP: External EP loader, compiled model compatibility API
MIGraphX EP: QuickGelu, multihead attention, QLinear pooling ops

ArmNN EP

Arm is formally deprecating the Arm NN Execution Provider (EP) in ONNX Runtime. The Arm NN EP is still experimental and depends on technology that is no longer actively maintained. Keeping it available now only adds complexity and potential confusion for users.

What to expect:

Effective immediately, the Arm NN EP is deprecated and will no longer be maintained
All build options, documentation, and examples referencing ArmNN will be removed once the upstream change merges; the removal will appear in the first ONNX Runtime release that includes that change. We will confirm the release number as soon as it is known
Builds that still rely on Arm NN-specific options (for example --use_armnn) will fail after the change lands, so please adjust configurations in advance

🌐 Web & JavaScript

WebGPU EP: Flash Attention optimizations, graph capture, Split-K MatMul, qMoE support, WGSL templates
WebNN EP: GQA local attention, GatherBlockQuantized, ConvInteger/MatMulInteger
Node.js/React Native: Node.js v22, JSI for React Native, JSPI build support

🧠 CPU Improvements

KleidiAI: SME1/SME2 Convolution and SGemm kernels, FP32 Gemv, Windows/Arm support
New ops: MoE/qMoE kernels, RotaryEmbeddings opset 23, LayerNorm/RMSNorm broadcasting
Platform support: S390x SIMD, LoongArch64 4-bit quantization, FP16 inference improvements
ARM NCHWc layout support: NCHWc layout support for potential performance improvement of Conv models. Needs building from source with --enable_arm_neon_nchwc to enable this feature (#25580 #26838 #26691 #26171). This feature may be turned ON by default in a future release based on community feedback.
ARM perf improvements: Dedicated depthwise conv kernel (#26688) and SiLU activation perf improvement (#26753)

🔌 Language Bindings

C#

.NET 9.0 MAUI targets (#26463)

Python

add_external_initializers_from_files (#26012)

Java

Auto EP and compile model support (#25131)
OrtCompiledModelCompatibility (#26028)

🐛 Bug Fixes

Critical Fixes

DoS vulnerability in FuseReluClip (#26878)
Security issue loading arbitrary files as external data (#26776)
Memory leak fix for KernelContext_GetAllocator (#26883)
Local Attention off-by-1 bug (#25927)

EP-Specific Fixes

[QNN] Clip op with min/max from QDQ (#26601)
[CoreML] Gather fp16 support (#26442)

🙏 Contributors

Thanks to our 170 contributors for this release!

@fs-eire, @tianleiwu, @edgchen1, @qjia7, @yuslepukhin, @hariharans29, @Honry, @qti-yuduo, @adrianlizarraga, @snnn, @eserscor, @vraspar, @xiaofeihan1, @guschmue, @daijh, @quic-muchhsu, @qti-jkilpatrick, @tirupath-qti, @Jiawei-Shao, @qti-hungjuiw, @quic-ashwshan, @titaiwangms, @qti-mattsinc, @chilo-ms, @jchen10, @xhcao, @skottmckay, @quic-calvnguy, @JonathanC-ARM, @Rohanjames1997, @sushraja-msft, @jambayk, @adrastogi, @xenova, @quic-tirupath, @justinchuby, @HectorSVC, @kunal-vaishnavi, @wenqinI, @prathikr, @baijumeswani, @preetha-intel, @jatinwadhwa921, @umangb-09, @qti-ashwshan, @carzh, @bachelor-dou, @ranjitshs, @gedoensmax, @xadupre, @nenad1002, @TedThemistokleous, @keshavv27, @zpye, @jnagi-intel, @jiafatom, @mingyueliuh, @Colm-in-Arm, @borg323, @chunghow-qti, @Craigacp, @BODAPATIMAHESH, @AlekseiNikiforovIBM, @hans00, @thevishalagarwal, @MaanavD, @qti-kromero, @damdoo01-arm, @BoarQing, @naomiOvad, @yuhuchua-qti, @hadiFute, @vishalpandya1990, @rivkastroh, @minfhong-qti, @kuanyul-qti, @xieofxie, @ankitm3k, @RyanMetcalfeInt8, [@MayureshV1](https://github...

Assets 11

25 Oct 04:15

snnn

v1.23.2

a83fc4d

ONNX Runtime v1.23.2

ORT 1.23.2 cherrypick 1 (#26368)

Adds the following commits to the release-1.23.2 branch for ORT 1.23.2:

- [TensorRT] Fix DDS output bug during engine update
  - PR: https://github.com/microsoft/onnxruntime/pull/26272
  - commit id: 00e85dd3c84f511fee373d152d461f6e81d7f514
- Fix shape inference failure with in-memory external data
   - PR: https://github.com/microsoft/onnxruntime/pull/26263
   - commit id: d955476911997842cb058174c18f30f8dc3693b4
- [CUDA] replace 90a-virtual by 90-virtual for forward compatible 
  - PR: https://github.com/microsoft/onnxruntime/pull/26230
  - commit id: b58911f7445be56e45cb0f7993c0d43e6839c09e
- [QNN-EP] Fix logic flow bug
  - PR: https://github.com/microsoft/onnxruntime/pull/26148
  - commit id: b282379ac6066e8de9a5a68f1ce5ef1cf566dd04
- Internal Dupe of #25255 - [MLAS] Optimize MlasConv using thread
partition opt
  - PR: https://github.com/microsoft/onnxruntime/pull/26103
  - commit id: 736251899137449311819bab36ff1c47ea09a62c
- Update qMoE spec to support block quantization
  - PR: https://github.com/microsoft/onnxruntime/pull/25641
  - commit id: 7a8ffa80b78c1e363a04eb7b8ebae22c4e45d140
- [VitisAI] add new api to VitisAI to save graph as a string
  - PR: https://github.com/microsoft/onnxruntime/pull/25602
  - commit id: 3361d723a526d6bcd9ac473ce6d3f0a1a89244da
- [[Build] Lock torch, onnxscript and onnx-ir versions to latest]
  - PR: https://github.com/microsoft/onnxruntime/pull/26315
  - commit id: ea69c4df0bd032ec1ca3790455f123de99187cee

---------

Co-authored-by: Hariharan Seshadri <shariharan91@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Yateng Hong <toothache9010@gmail.com>
Co-authored-by: Changming Sun <chasun@microsoft.com>
Co-authored-by: Dmitri Smirnov <dmitrism@microsoft.com>
Co-authored-by: Tianlei Wu <tlwu@microsoft.com>
Co-authored-by: quic-calvnguy <quic_calvnguy@quicinc.com>
Co-authored-by: quic_calvnguy <quic_calvnguy@quic_inc.com>
Co-authored-by: yifei410 <31260809+yifei410@users.noreply.github.com>
Co-authored-by: yifei <y.zhou@xilinx.com>

Assets 11

08 Oct 04:12

snnn

v1.23.1

d9b2048

ONNX Runtime v1.23.1

What's Changed

Fix Attention GQA implementation on CPU (#25966)
Address edge GetMemInfo edge cases (#26021)
Implement new Python APIs (#25999)
MemcpyFromHost and MemcpyToHost support for plugin EPs (#26088)
[TRT RTX EP] Fix bug for generating the correct subgraph in GetCapability (#26132)
add session_id_ to LogEvaluationStart/Stop, LogSessionCreationStart (#25590)
[build] fix WebAssembly build on macOS/arm64 (#25653)
[CPU] MoE Kernel (#25958)
[CPU] Block-wise QMoE kernel for CPU (#26009)
[C#] Implement missing APIs (#26101)
Regenerate test model with ONNX IR < 12 (#26149)
[CPU] Fix compilation errors because of unused variables (#26147)
[EP ABI] Check if nodes specified in GetCapability() have already been assigned (#26156)
[QNN EP] Add dynamic option to set HTP performance mode (#26135)

Full Changelog: v1.23.0...v1.23.1

Assets 11

26 Sep 04:33

snnn

v1.23.0

be835ef

ONNX Runtime v1.23.0

Announcements

This release introduces Execution Provider (EP) Plugin API, which is a new infrastructure for building plugin-based EPs. (#24887 , #25137, #25124, #25147, #25127, #25159, #25191, #2524)
This release introduces the ability to dynamically download and install execution providers. This feature is exclusively available in the WinML build and requires Windows 11 version 25H2 or later. To leverage this new capability, C/C++/C# users should use the builds distributed through the Windows App SDK, and Python users should install the onnxruntime-winml package(will be published soon). We encourage users who can upgrade to the latest Windows 11 to utilize the WinML build to take advantage of this enhancement.

Upcoming Changes

The next release will stop providing x86_64 binaries for macOS and iOS operating systems.
The next release will increase the minimum supported macOS version from 13.4 to 14.0.
The next release will stop providing python 3.10 wheels.

Execution & Core Optimizations

Shutdown logic on Windows is simplified

Now on Windows some global object will be not destroyed if we detect that the process is being shutting down(#24891) . It will not cause memory leak as when a process ends all the memory will be returned to the operating system. This change can reduce the chance of having crashes on process exit.

AutoEP/Device Management

Now ONNX Runtime has the ability to automatically discovery computing devices and select the best EPs to download and register. The EP downloading feature currently only works on Windows 11 version 25H2 or later.

Execution Provider (EP) Updates

ROCM EP was removed from the source tree. Users are recommended to use Migraphx or Vitis AI EPs from AMD.
A new EP, Nvidia TensorRT RTX, was added.

Web

EMDSK is upgraded from 4.0.4 to 4.0.8

WebGPU EP

Added WGSL template support.

QNN EP

SDK Update: Added support for QNN SDK 2.37.

KleidiAI

Enhanced performance for SGEMM, IGEMM, and Dynamic Quantized MatMul operations, especially for Conv2D operators on hardware that supports SME2 (Scalable Matrix Extension v2).

Known Problems

There was a change in build.py that was related to KleidiAI that may cause build failures when doing cross-compiling (#26175) .

Contributions

Contributors to ONNX Runtime include members across teams at Microsoft, along with our community members:

@1duo, @Akupadhye, @amarin16, @AndreyOrb, @ankan-ban, @ankitm3k, @anujj, @aparmp-quic, @arnej27959, @bachelor-dou, @benjamin-hodgson, @Bonoy0328, @chenweng-quic, @chuteng-quic, @clementperon, @co63oc, @daijh, @damdoo01-arm, @danyue333, @fanchenkong1, @gedoensmax, @genarks, @gnedanur, @Honry, @huaychou, @ianfhunter, @ishwar-raut1, @jing-bao, @joeyearsley, @johnpaultaken, @jordanozang, @JulienMaille, @keshavv27, @kevinch-nv, @khoover, @krahenbuhl, @kuanyul-quic, @mauriciocm9, @mc-nv, @minfhong-quic, @mingyueliuh, @MQ-mengqing, @NingW101, @notken12, @omarhass47, @peishenyan, @pkubaj, @qc-tbhardwa, @qti-jkilpatrick, @qti-yuduo, @quic-ankus, @quic-ashigarg, @quic-ashwshan, @quic-calvnguy, @quic-hungjuiw, @quic-tirupath, @qwu16, @ranjitshs, @saurabhkale17, @schuermans-slx, @sfatimar, @stefantalpalaru, @sunnyshu-intel, @TedThemistokleous, @thevishalagarwal, @toothache, @umangb-09, @vatlark, @VishalX, @wcy123, @xhcao, @xuke537, @zhaoxul-qti

Contributors

JulienMaille, wcy123, and 71 other contributors

Assets 11

3 Join discussion

13 Aug 16:53

vraspar

v1.22.2

5630b08

ONNX Runtime v1.22.2

What's new?

This release adds an optimized CPU/MLAS implementation of DequantizeLinear (8 bit) and introduces the build option client_package_build, which enables default options that are more appropriate for client/on-device workloads (e.g., disable thread spinning by default).

Build System & Packages

Add –client_package_build option (#25351) - @jywu-msft
Remove the python installation steps from win-qnn-arm64-ci-pipeline.yml (#25552) - @snnn

CPU EP

Add multithreaded/vectorized implementation of DequantizeLinear for int8 and uint8 inputs (SSE2, NEON) (#24818) - @adrianlizarraga

QNN EP

Add support for the Upsample, Einsum, LSTM, and CumSum operators (#24265, #24616, #24646, #24820) - @quic-zhaoxul, @1duo, @chenweng-quic, @Akupadhye
Fuse scale into Softmax (#24809) - @qti-yuduo
Enable DSP queue polling when performance is set to “burst” mode (#25361) - @quic-calvnguy
Update QNN SDK to version 2.36.1 (#25388) - @qti-jkilpatrick
Include the license file from QNN SDK in the Microsoft.ML.OnnxRunitme.QNN NuGet package (#25158) - @HectorSVC

Contributors

snnn, 1duo, and 9 other contributors

Assets 4

08 Jul 22:08

vraspar

v1.22.1

89746dc

ONNX Runtime v1.22.1

What's new?

This release replaces static linking of dxcore.lib with optional runtime loading, lowering the minimum supported version from Windows 10 22H2 (10.0.22621) to 20H1 (10.0.19041). This enables compatibility with Windows Server 2019 (10.0.17763), where dxcore.dll may be absent.

change dependency from gitlab eigen to github eigen-mirror #24884 - @prathikr
Weaken dxcore dependency #24845 - @skottmckay
[DML] Restore compatibility with Windows Sdk 10.0.17134.0 #24950 - @JulienMaille
Disable VCPKG's binary cache #24889 - @snnn

Contributors

JulienMaille, snnn, and 2 other contributors

Assets 11

Releases: microsoft/onnxruntime

ONNX Runtime v1.25.0

📢 Announcements & Breaking Changes

Build & Platform

Execution Provider Changes

API Version

🔒 Security Fixes

✨ New Features

🔌 Execution Provider Plugin API & CUDA Plugin EP

🔧 Core APIs

📊 New ONNX Ops & Opset Coverage

🖥️ Execution Provider Updates

NVIDIA CUDA EP

NVIDIA TensorRT RTX EP

Qualcomm QNN EP

Other EPs

🌐 Web & JavaScript

WebGPU EP — Performance

Uh oh!

ONNX Runtime v1.24.4

Bug Fixes

Execution Provider Updates

Build and Infrastructure

Contributors

Uh oh!

ONNX Runtime v1.24.3

Security Fixes

Bug Fixes

Performance Improvements

Execution Provider Updates

Build and Infrastructure

Contributors

Uh oh!

ONNX Runtime v1.24.2

Bug Fixes

Execution Provider Updates

Build and Infrastructure

Contributors

Uh oh!

ONNX Runtime v1.24.1

📢 Announcements & Breaking Changes

Platform Support Changes

API Version

✨ New Features

🤖 Execution Provider (EP) Plugin API

🔧 Core APIs

📊 Dependencies & Integration

🖥️ Execution Provider Updates

NVIDIA

Qualcomm QNN EP

Intel & AMD

ArmNN EP

🌐 Web & JavaScript

🧠 CPU Improvements

🔌 Language Bindings

C#

Python

Java

🐛 Bug Fixes

Critical Fixes

EP-Specific Fixes

🙏 Contributors

Uh oh!

ONNX Runtime v1.23.2

Uh oh!

ONNX Runtime v1.23.1

What's Changed

Uh oh!

ONNX Runtime v1.23.0

Announcements

Upcoming Changes

Execution & Core Optimizations

Shutdown logic on Windows is simplified

AutoEP/Device Management

Execution Provider (EP) Updates

Web

WebGPU EP

QNN EP

KleidiAI

Known Problems