Skip to content

Upcoming Release Roadmap

Maanav Dalal edited this page Jul 23, 2024 · 7 revisions

ONNX Runtime 1.19

Target Release: Mid August 2024

Build System & Packages

  • Stopping publishing packages for Python 3.8.
  • Discontinuing support for Xamarin. (Xamarin reached EOL on May 1, 2024)
  • Discontinuing support for macOS 11 and increasing the minimum supported macOS version to 12. (macOS 11 reached EOL in September 2023)
  • Discontinuing support for iOS 12 and increasing the minimum supported iOS version to 13.
  • Introducing Java CUDA 12 packages on Maven.
  • Adding support for CPython 3.13.0b1.
  • Implementing Friendlier error messages for missing DLLs when loading dynamic loadable EPs (e.g., CUDA) on Windows to reduce CUDA version mismatch issues.

Core

  • Completing E2E MultiLora support, including work in the GenAI layer.
  • Implementing DeformConv
  • Removing the OrtMutex class and the dependency on nsync. Standard C++ std::mutex will be used instead.

Performance

  • Adding QDQ support for int4 quantization in CPU and CUDA EP.
  • Implementing FlashAttention on CPU to improve performance for GenAI prompt cases.
  • Improving int4 performance for CPU (x64, arm64) and Nvidia GPU.
  • Enabling running fp16 gemm with fp8 capacity on Nvidia GPU.

Execution Providers

TensorRT

  • No specific updates mentioned.

QNN

  • No specific updates mentioned.

OpenVINO

  • Adding support for OpenVINO 2024.3.

DirectML

  • Updating DirectML from 1.14.1 → 1.15.
  • Updating ONNX opset from 17 → 19.

Mobile

  • Implementing CoreML ML Program operators for the Autodesk model.
  • Developing a GPU EP proof-of-concept for phi-3.
  • Updating mobile documentation.
  • Removing references to deprecated 'mobile' packages.
  • Updating recommendations for building and deployment.

Web

  • Updating JavaScript packaging to align with the latest best practices, introducing slight incompatibilities when apps bundle onnxruntime-web.
  • Adding support for grouped-query attention (GQA).
  • Adding support for phi3-vision.
  • Improving CPU ops coverage for WebNN, now supported by Chrome.

Training

  • No specific updates mentioned.

GenAI

  • Adding support for the Whisper model.
  • Adding Java bindings.
  • Introducing Android packages.
  • Introducing Windows ARM packages.

Extensions

  • Adding Audio FeatureExtractor APIs.
  • Enhancing support for models in tokenization with a more efficient tiktoken algorithm.
  • Supporting SOTA model for multimodal applications.
  • Enhancing Custom Op Lite API on GPU and fused kernels for DORT.

*note: all mentioned features are subject to change