Releases: intel/llvm
oneAPI DPC++ Compiler dependencies
oneAPI DPC++ Compiler dependencies
This release contains OpenCL RT for Intel CPU and FPGA emulator used for oneAPI DPC++ Compiler and runtime validation
Please, see the runtime installation guide here.
oneAPI DPC++ Compiler dependencies
This release contains OpenCL RT for Intel CPU and FPGA emulator used for oneAPI DPC++ Compiler and runtime validation
Please, see the runtime installation guide here.
oneAPI DPC++ Compiler dependencies
This release contains OpenCL RT for Intel CPU and FPGA emulator used for oneAPI DPC++ Compiler and runtime validation
Please, see the runtime installation guide here.
oneAPI DPC++ Compiler dependencies
This release contains OpenCL RT for Intel CPU and FPGA emulator used for oneAPI DPC++ Compiler and runtime validation
Please, see the runtime installation guide here.
oneAPI DPC++ Compiler dependencies
This release contains OpenCL RT for Intel CPU and FPGA emulator used for oneAPI DPC++ Compiler and runtime validation
Please, see the runtime installation guide here.
oneAPI DPC++ Compiler 2022-12
New features
SYCL Compiler
- Added support for per-object device code compilation under the option
-fno-sycl-rdc
. This improves compiler performance and reduces memory usage,
but can only be used if there are no cross-object dependencies. [f884993] - Added support for per-aspect device code split mode. [9a2c4fe]
- Extended support for the large GRF mode to non-ESIMD kernels. [9994934]
[ab2a42c] - Implemented the
sycl_ext_intel_device_architecture
extension. [0e32a28] [b59d93c] [5bd5c87] [e5de913] - Implemented the
sycl_ext_oneapi_kernel_properties
experimental extension. [332e4ee] [27454de] [70ee3d5] [430c722] - Added support for generic address space atomic built-ins to CUDA libclc.
[d6a8fd1]
SYCL Library
- Implemented accessor member functions
swap
,byte_size
,max_size
and
empty
. [f1f907a] - Implemented SYCL 2020 default accessor constructor. [04928f9]
- Implemented SYCL 2020 accessor iterators. [5b9fd3c] [c7b1a00]
- Changed
value_type
of read-only accessors toconst
in accordance with
SYCL 2020. [227614c] - Implemented SYCL 2020
multi_ptr
andaddress_space_cast
. [8700b76]
[483984a] [4a9e9a0] - Implemented SYCL 2020
has_extension
free functions. [7f1a6ef] - Implemented SYCL 2020
aspect_selector
. [c0a4a56] - Implemented new SYCL 2020 style FPGA selectors. [0417651]
- Implemented SYCL 2020 default
async_handler
behavior. [cd93d8f] - Implemented SYCL 2020
is_compatible
free function. [67f6bba] - Implemented queue shortcut functions with placeholder accessors. [5ee066e]
- Added support for creating a kernel bundle with descendent devices of the
passed context's members. [a782779] - Implemented non-blocking destruction and deferred release of memory objects
without attached host memory. [894ce25] - Implemented the
sycl_ext_oneapi_queue_priority
extension. [cdb09dc] - Implemented the
sycl_ext_oneapi_user_defined_reductions
extension. [8311d79] - Implemented the
sycl_ext_oneapi_queue_empty
extension proposal. [c493295] - Implemented the
sycl_ext_oneapi_weak_object
extension. [d948427] [9297f63] - Implemented the
sycl_ext_intel_cslice
extension. The old behavior that exposed compute slices as sub-sub-devices is
now deprecated. For compatibility purposes, it can be brought back via the
SYCL_PI_LEVEL_ZERO_EXPOSE_CSLICE_IN_AFFINITY_PARTITIONING
environment
varible. [5995c618] - Implemented the
sycl_ext_intel_queue_index
extension. [d2ec964] [7179e83] - Implemented the
sycl_ext_oneapi_memcpy2d
extension. [516d411] - Implemented device ID, memory clock rate and bus width information queries
from thesycl_ext_intel_device_info
extension. [1d99344] [4f7787c] - Implemented
ext::oneapi::experimental::radix_sorter
from the
sycl_ext_oneapi_group_sort
extension proposal. [86ba180] - Implemented a new unified interface for the
sycl_ext_oneapi_matrix
extension for CUDA. [166bbc3] - Added support for sorting over sub-groups. [168767c]
- Added C++ API wrappers for the Intel math functions
ceil
,floor
,rint
,
sqrt
,rsqrt
andtrunc
. [1b7582b] - Implemented a SYCL device library for
bfloat16
Intel math function
utilities. [fc136d6] - Added support for range reductions with any number of reduction variables.
[572bc50] - Added support for reductions with kernels accepting
item
. [5d5e9f4] - Enabled sub-group masks for 64-bit subgroups. [10d50ed]
- Implemented the new non-experimental API for DPAS. [55bf1a0] [1e7a8ea]
- Added 8/16-bit type support to
lsc_block_load
andlsc_block_store
ESIMD
API. [f9d8059] - Implemented atomic operation support in the ESIMD emulator. [a6a0dea]
- Added various trivial utility functions for the
half
type. [b4ce7c0] - Added type cast functions between
half
andfloat
/integer types to
libdevice. [599b1b9] - Implemented the
ONEAPI_DEVICE_SELECTOR
environment variable that, in
addition to supportingSYCL_DEVICE_FILTER
syntax, allows to expose GPU
sub-devices as SYCL root devices and supports negative filters.
SYCL_DEVICE_FILTER
is now deprecated. [28d0cd3] [b21e74e] [77b6f34]
[6bd5f9c] [6aefd63] - Added the
SYCL_PI_LEVEL_ZERO_SINGLE_ROOT_DEVICE_BUFFER_MIGRATION
enviornment variable. [bd03e0d]
Documentation
- Added the
sycl_ext_oneapi_device_architecture
extension specification. [7f2b17e] - Added the
sycl_ext_oneapi_memcpy2d
extension specification. [296e9c3] - Added the
sycl_ext_oneapi_user_defined_reductions
extension specification. [cd4fd8c] - Added the
sycl_ext_oneapi_weak_object
extension specification. [d948427] - Added the
sycl_ext_oneapi_prod
extension proposal. [ed7cb4b] - Added the
sycl_ext_codeplay_kernel_fusion
extension proposal. [be3dfbd] - Added the
sycl_ext_intel_queue_index
extension proposal. [f5fb759] - Added the
sycl_ext_intel_cslice
extension proposal. [5777e1f] - Added the
sycl_ext_oneapi_group_sort
extension update proposal that introduced sorting functions with fixed-size arrays. [c6d1caf] - Added device ID, memory clock rate and bus width device information queries to the
sycl_ext_intel_device_info
extension. [1d99344][4f7787c]
Improvements
SYCL Compiler
- Added the
InferAddressSpaces
pass to the SPIR/SPIR-V compilation pipeline,
reducing the size of the generated device code. [a3ae0dd] - Redesigned pointer handling so that it no longer decomposes kernel argument
types containing pointers. [3916d3b] [d55e9c2] [9b02506] - Kernel lambda operator is now always inlined in the device code entry point
unless-O0
is used. [b91b732] [2359d94] - Improved entry point handling in the
sycl-post-link
tool. [53d9c7b] - The
reqd_work_group_size
attribute now works with 1, 2 or 3 operands.
[4ff42c3] - Enabled using
-fcf-protection
option with-fsycl
, which results in it
being applied only to host code compilation and producing a warning. [b6f61f6] - Linux based compiler driver on Windows now pulls in the
sycld
debug library
whenmsvcrtd
is specified as a dependent library. [ebf6c59] - Added
/Zc:__cplusplus
as a default option during host compilation with MSVC.
[e7ed860] - Improved the
ESIMDOptimizeVecArgCallConv
optimization pass to cover more IR
patterns. [4926454] - Added support for more types in ESIMD lsc functions. [d9e40ec]
- Added error diagnostics for using
sycl::ext::oneapi::experimental::annotated_arg/ptr
as a nested type.
[321c733] - The status of
bfloat16
support was changed from experimental to supported.
[7b47ebb]
SYCL Library
- Updated
online_compiler
with Gen12 GPU support. [adfb1c1] -
get_kernel_bundle
andhas_kernel_bundle
now check that the kernels are
compatible with the devices. [91b1515] - Waiting for an event associated with a kernel that uses a stream now also
waits for the stream to be flushed. [1db0e81] - Added the requested device type to the message of the exception thrown when no
such devices are found. [6b83ad7] - Optimized
operator[]
ofhost_accessor
. [01e60f7] - Improved reduction performance on discrete GPUs. [99bdc82]
- Added
invoke_simd
support for functions withvoid
return type. [3fd0850] - The Level Zero plugin now creates every event as host-visible by default.
[f3d245d] - Added Level Zero plugin support for global work sizes greater than
UINT32_MAX
as long as they are divisible by some legal work-group size and
the resulting quotient does not exce...
oneAPI DPC++ Compiler dependencies
This release contains OpenCL RT for Intel CPU and FPGA emulator used for oneAPI DPC++ Compiler and runtime validation
Please, see the runtime installation guide here.
oneAPI DPC++ Compiler 2022-09
New features
SYCL Compiler
- Added ability to enforce stateless memory accesses for ESIMD. [1811162]
- Added support for
-fsycl-force-target
compiler option. [1d95f2e] - Added support for
[[intel::max_reinvocation_delay
]] loop attribute. [90fa5bb] - Added support for
-fsycl-huge-device-code
compiler option, which allows
linking object files larger than 2GB. [f963062] - Added support for compiling
.cu
files with SYCL compiler. [e76ad72] - Added support for
assert
on HIP backend. [ade1870] - Enabled CXX standard library functions for CUDA backend. [1fe92c5]
- Implemented group collective built-in functions for more integral types. [d4933b6]
SYCL Library
- Implemented SYCL 2020 callable device selectors. [64f0db7]
- Implemented SYCL 2020 standalone device selectors. [bfc7e98]
- Added SYCL 2020 property interfaces for
local_accessor
,usm_allocator
,
accessor
andhost_accessor
classes. [1136b40] [da7dcf8] - Added support for
fpga_simulator_selector
. [9bef890] - Added support for
local_accessor
. Deprecatedtarget::local
. [e4423ef] - Added support for querying free device memory on Level Zero backend. [0eeef2b]
- Added support for querying free device memory on CUDA and HIP backends. [436f0d8]
- Implemented
bfloat16
conversions from/tofloat
for host. [2a383f1] - Added support for
ext::oneapi::property::queue::discard_events
to
Level Zero PI plugin. [1372120] - Added
lsc_atomic
support on ESIMD emulator. [0c051a8] - Added
dpas
support on ESIMD emulator. [3d506a3] - Added C++ API for
imf
libdevice built-ins. [830916a] - Implemented
make_queue
for CUDA backend. [89460e8] - Implemented
has_native_event
andmake_event
for CUDA backend. [74369c8] - Added support of CUDA XPTI tracing. [0cd0414]
- Introduced predicates for ESIMD
lsc_block_store/load
. [f44edce] - Added experimental
set_kernel_properties
API anduse_double_grf
property
for ESIMD. [9a55da5] - Added "eager initialization" mode to Level Zero PI plugin. It might result
in an unnecessary work done by the plugin, but ensures the fastest possible
execution on hot and reportable paths. [c145959] - Added full support of element wise operations on
joint_matrix
on CUDA
backend includingbfloat16
support. [0a1d751] - Implemented
group::get_linear_id(int)
method [6e83c12]
Documentation
- Added stateful to stateless memory access conversion
design document. [3e03f30] - Added
sycl_ext_oneapi_complex
extension proposal. [01589da] - Updated
sycl_ext_intel_fpga_device_selector
extension to addfpga_simulator_accessor
. [9bef890] - Added
sycl_ext_intel_fpga_kernel_interface_properties
extension proposal. [4b6bd14] - Updated
sycl_ext_oneapi_complex_algorithms
extension to includesycl::complex
as supported type for algorithms. [07c5b48] - Clarified sub-group size calculation in
sycl_ext_oneapi_invoke_simd
extension spec. [9b33ad0] - Updated
sycl_ext_oneapi_accessor_properties
to markhas_property
API asnoexcept
. [7805aa3] - Updated
sycl_ext_intel_device_info
to support querying free device memory. [0eeef2b] - Updated
sycl_ext_oneapi_matrix
with description of new matrix features. [770f540] - Moved
sycl_ext_oneapi_invoke_simd
extensions specification fromproposed
toexperimental
because
implementation is available. [6bee344]
Improvements
SYCL Library
- Ensured that a correct
errc
thrown for an unassociated placeholder
accessor. [4f9935a] - Removed dependency on OpenCL ICD Loader from the runtime. [90e8b5e]
- Added support for
ZEBIN
format to persistent caching mechanism. [34dcf83] - Added identification mechanism for binaries in newer
ZEBIN
format. [f4dee54] - Switched to use
struct
information descriptors in accordance with SYCL 2020.
Removed some deprecated information queries. [b3cbda5] - Updated
kernel_device_specific::max_sub_group_size
query to match SYCL 2020
spec. Deprecated the old variant. [7842d05] - Deprecated SYCL 1.2.1 device selectors. [c058380]
- Improved error messages reported for unsupported device partitioning. [1c9ddba]
- Made
device
andplatform
default todefault_selector_v
. [b32dd41] - Deprecated
address_space::constant_space
. [351b123] - Marked
sycl::exception::has_context
asnoexcept
. [ad923c9] - Improved range reductions performance on CPU. [3323da6]
- Made
sycl::exception
nothrow
copy constructible. [289e33d] - Marked
has_property
methods asnoexcept
. [417b5a2] - Improved
sycl::event::get_profiling_info
exception message whenevent
is
default constructed. [2e86cd4] - Added a diagnostic (in form of
static_assert
) about kernel lambda size
mismatch between host and device. [d278c67] [ec179b7] [f417a88] - Updated
pipes
class to throw exceptions if used on host. [eab2969] - Updated ESIMD Emulator PI plugin to report support for
cl_khr_fp64
extension. [398571a] - Updated Level Zero plugin to prefer copy engine for memory read/write
operations. [65c3ea2] - Optimized some memory transfers. [92d35cd]
- Enabled event caching in Level Zero PI plugin. [a41b33c]
- Optimized some reductions with
parallel_for
acceptingsycl::range
for discrete GPUs. [c22a5d3] - Improved performance of event synchronization on CUDA backend. [c4f326a]
- Added ability to use descendent devices of context members within that
context. Not supported with OpenCL backend yet. [a0c8c50] [78a483c] - Added support for querying
atomic64
device capability with HIP backend. [cb190fc] - Enabled FTZ operations for CUDA/PTX backend via
-fcuda-flush-denormals-to-zero
. [e8e7ae8] - Improved error message about incorrect kernel argument types with CUDA backend. [2542e6a]
- Limited allowed argument types for
rol/ror
ESIMD functions to better
represent HW capabilities. [b05f256] - Implemented
mem_advise
reset and managed memory checks for CUDA backend. [fe18839] - Added concurrent memory check to
mem_advise
on CUDA backend. [33746d8] - Enabled multiple HIP streams per SYCL queue. [e0c40a9]
- Implemented lazy mechanism of setting context for default-constructed events. [ed92c4c]
- Improved performance for multi-dimensional accessors with multiple accesses
in a kernel. [7c58b9a]
SYCL Compiler
- Increased max
_Bitint
size to 4096 for FPGA target. [db5f72a] [3f06cad] - Removed deprecation message for
[[intel::disable_loop_pipelining]]
attribute. [07201f5] - Allowed
__builtin_assume_aligned
to be called from device code. [24937ea] - Improved link step performance when
per_kernel
device code split is used. [84de9d6] - Added support for
SYCL_EXTERNAL
ondevice_global
variables. [8b958f6] - Updated
__builtin_intel_fpga_mem
to accept more parameters. [231338d] - Updated
ivdep
attribute to allowsafelen = 0
. [558b3ba] - Improved linking with
sycl.lib
on Windows. [404d281] - Implemented more diagnostics about incorrect
device_global
usages. [1265721] - Improved library resolution for
libsycl.so
. [4ce19d6] - Improved diagnostics when linking with mismatched objects. [0e0202e]
- Added a warning for floating-point size changes after implicit conversions. [e4f5d55]
- Made
invoke_simd
convert its argument to appropriate types. [038764f]
Documentation
- Removed explicit
cl
namespace references. [433ea5c] - Added a short guideline on using CMake with SYCL compiler. [fa603c3]
Bug fixes
SYCL Library
- Fixed a compilation issue where it wasn't possible to pass an initializer list
for dependency events vector inqueue
shortcuts withoffset
parameter. [f4f83d9] - Fixed
sycl::get_pointer_device
throwing an exception when it passed a
descendent device (sub-device) instead of a root device. [26d5d98] - Fixed memory leak happening when kernel bundles are linked. [980677d]
- Fixed USM free throwing an exception when it passed a context created for
a descendent device. [c49d494] - Fixed
accessor
's CTAD forg++
host compiler. [57aabe7] - Fixed a compilation issue when using multi-dimensional
accessor
's subscript
operator. [22e3fc5] - Fixed "definition with the same mangled name" error happening when used
multiple buffer reductions in a kernel. [a0a4d72] - Fixed a compilation issue with SYCL math built-ins when GCC < 11.1 is used as
a host compiler. [c786894] - Fixed a compilation issue with SYCL math built-ins (such as
sycl::modf
,
for example) not accepting pointers tohalf
. [e286166] - Fixed an issues with
reduction
s when MSVC is used as host compiler. [94c4b80] - Fixed a compilation issue when fully specialized
sycl::span
is initialized
from an array. [2b50820] - Fixed a crash in...
oneAPI DPC++ Compiler dependencies
This release contains OpenCL RT for Intel CPU and FPGA emulator used for oneAPI DPC++ Compiler and runtime validation
Please, see the runtime installation guide here.