Releases: intel/DML
Intel DML v1.2.0
Functionality
- Introduced a new internal submission mechanism for platforms based on Linux* OS kernel versions where MMAP is no longer permitted. For more details, refer to the Intel Security Advisory. When MMAP is unavailable, the write system call is used instead. This may introduce additional overhead for smaller data sizes (smaller than 16KB), that results in slightly higher Latency and lower Throughput.
- Updated the DML device search mechanism to a new default behavior. Now, the platforms with Sub-NUMA clustering configured such that not all NUMA nodes have an accelerator instance can utilize any DSA instance from the same socket for execution. If more fine-grained control is needed, the Low-Level API of the library provides the ability to select devices from a specific NUMA node using the
numa_id
field in the job structure. - Introduced a new Low-Level API function
dml_batch_get_crc()
which retrieves the resulting CRC from a CRC operation.
Usability and Documentation
- Extended examples to use new operation
dml_batch_get_crc()
and also to clarify use of crc seed for CRC operation.
Known Limitations
- Intel(R) DML could be built from directly downloadable files (
.tar
,.tgz
) only without tests and benchmark frameworks, using the-DDML_BUILD_TESTS=OFF
build option since they require submodules that are not included in the archives by GitHub* during release creation. - Delta Record operations are not currently supported on the
hardware_path
. - Batch operation is currently not supported for the platforms based on Linux* OS kernel versions where MMAP is not permitted.
- Known test failures are listed below:
- block_on_fault/apply_delta_page_fault.read/1
Intel DML v1.1.2
This is a patch release containing the following change to v1.1.1:
Bug Fixes
- Fixed possible "_FORTIFY_SOURCE redefined" build warning/error. Some GCC* builds could internally set
_FORTIFY_SOURCE
and that could have resulted into DML build error.
Known Issues / Limitations
- Intel DML can be built from directly downloadable files (
.tar
,.tgz
) only without tests and benchmark frameworks, using the
-DDML_BUILD_TESTS=OFF
build option because they require submodules that are not included in the archives by GitHub during release creation. - Known test failures are listed below. Some tests fail only under certain conditions, which are noted in parentheses.
- (
hardware_path
,auto_path
)block_on_fault/apply_delta_page_fault.read/1
. - (
hardware_path
on DSA 2.0)dml_drain.ta_default_parameters
test could hang when DSA 2.0 is used.
- (
- There is an issue on the
auto_path
for continuation after page fault if the page fault occurred on a pattern boundary (for fill or compare_pattern operations). Where part of the pattern is used before page fault, and the pattern is restarted from the beginning after page fault.
Intel DML v1.1.1
This is a patch release containing the following changes to v1.1.0:
Usability and Documentation
- Created a Contributing Guide and Pull Request template.
Bug Fixes
- Fixed incorrect Page Fault handling on automatic path.
- Fixed warning when building with Clang and
-Wstrict-prototypes
. - Added missing
<stdexcept>
header that caused a build failure with Clang compiler. - Fixed incorrect job finalization in Low-Level API multi socket example.
- Fixed various issues flagged by the static code analysis tool.
- Fixed outdated link in README file.
Known Issues / Limitations
- Intel DML can be built from directly downloadable files (
.tar
,.tgz
) only without tests and benchmark frameworks, using the
-DDML_BUILD_TESTS=OFF
build option, because they require submodules that are not included in the archives by GitHub during release creation. - Known test failures are listed below. Some tests fail only under certain conditions, which are noted in parentheses.
- (hw/auto)block_on_fault/apply_delta_page_fault.read/1
- There is an issue on the auto path for continuation after pagefault, if the pagefault occured on a pattern boundary (for fill, compare_pattern operations). Where part of the pattern is used before pagefault, and the pattern is restarted from the beginning after pagefault.
Thanks to the Contributors
Release includes contributions from the project team as well as @haiyuewa.
Intel DML v1.1.0
Functionality
- Introduced Block on Fault support for High-Level and Low-Level APIs.
- Added Initial support for Intel(R) Data Streaming Accelerator 2.0.
- Added Clang* compiler support for Build and Testing.
Usability and Documentation
- Clarified the NUMA* support in the Quick Start section of Documentation.
- Updated the Installed package structure to comply with the Linux* OS file-system hierarchy.
- Extended returned status codes in case of queue submission errors for more accessible issues reporting.
- Updated GoogleTest* and Google* Benchmarks submodules to the latest released version.
- Reworked Low-Level API examples and added an option to select the execution path.
- Added a warning into the Documentation about the handler lifetime and usage.
Deprecations
- Deprecated
.dont_invalidate_cache()
method (High-Level API) andDML_FLAG_DONT_INVALIDATE_CACHE
(Low-Level API) for Cache Flush operation.
Issues Fixed
- Fixed various build issues with GCC* 12.
- Fixed Create Delta Record on an automatic path for the case when Page Faults happened. Previously, in partial completion, Create Delta Record was not updated before re-submitting to the software path.
- Fixed asynchronous execution using the automatic path to handle partial completion due to page fault correctly.
Known Limitations
- Intel(R) DML could be built from directly downloadable files (
.tar
,.tgz
) only without tests and benchmark frameworks, using the-DDML_BUILD_TESTS=OFF
build option since they require submodules that are not included in the archives by GitHub* during release creation. - Known test failures are listed below:
- block_on_fault/apply_delta_page_fault.read/1
Intel DML v1.0.0
Functionality
- Added Benchmark Framework with limited support. Refer to the Benchmark Framework Guide in the documentation for details regarding what is supported and how it can be used.
- Added no-operation (no-op) support to High-Level API that can be used in Batch operation as Fence.
- Added support of umonitor/umwait to Low-Level Job API (refer to
dml_wait_mode_t
enum). - Added
DML_MIN_BATCH_SIZE
macro to expose the minimum required batch size. - Added more status codes reporting for Low-Level Job API to allow reporting of all Intel DSA statuses.
Usability and Documentation
- Removed limitation that
libaccel-config.so.1
must be placed in/usr/lib64/
to execute an application that uses Intel(R) DML. Now user can specify its location usingLD_LIBRARY_PATH
environment variable. - Introduced the
-DDML_BUILD_{TESTS, EXAMPLES}
option (by default, isON
).-DDML_BUILD_TESTS=OFF
enables you to build the library without testing from directly downloadable files (.tar
,.tgz
). - Improved High-Level API examples by setting the execution path based on a command-line argument instead of hardcoding to use the Software Path.
- Restructured documentation and introduced general improvements and updates.
Deprecated Functionality
- Removed
dml_get_limits(...)
service function. - Removed
EFFICIENT_WAIT
build option. Now user should setDML_WAIT_MODE_UMWAIT
(refer todml_wait_mode_t
enum) when using Low-Level Job API in order to enable umonitor/umwait.
Breaking Changes
- Changed API for Low-Level API
dml_execute_job(...)
anddml_wait_job(...)
to includedml_wait_mode_t wait_mode
parameter.
Bug Fixes
- Fixed GCC* 11 build failures caused by missing headers.
- Fixed incorrect queue submission mechanism that might have led to segmentation fault with previous DML releases.
Known issues/limitations
- Intel DML could be built from downloadable files directly (
.tar
,.tgz
) only without tests and benchmark frameworks using the-DDML_BUILD_TESTS=OFF
build option. They require submodules that are not included in the archives on GitHub* during release creation. - Known test failures for Hardware Path are listed below.
- dml_cache_flush.ta_do_not_invalidate
- dmlhl_cache_flush/{2, 3}.dont_invalidate
- transfer_size/cache_flush.success/{1, 3, 5, 7, 9, 11, 13}
- alignment/cache_flush.success/{1, 3, 5, 7}
- create_delta.page_fault_{read_first, read_second, write}
v0.1.9-beta
Intel® DML v0.1.9-beta
Date: March 2022
Note: Release introduces a test system for the library.
Features:
- Added tests for the library under the
tests/
folder - Added example for multi-socket utilization of the library in the Code Samples and Examples section
v0.1.8-beta
Intel® DML v0.1.8-beta
Date: February 2022
Note: Release introduces the auto execution path and manual NUMA selection for C++ API as well as several page fault handling bugfixes.
Features:
- Implemented the auto execution path (software fallback) for C++ API. The library tries to use hardware, but in case it is unavailable, there is a software fallback.
- Added
numa_id
parameter fordml::execute
anddml::submit
functions to specify custom NUMA node id for submission. Setting a number allows the library to do cross-socket submissions. - Removed DML_HW cmake option. The library is built with HW support by default.
- Added dynamic optimization dispatcher. The library checks if a necessary instruction set is supported on the system at runtime.
Bug fix:
- Fixed erroneous results for Compare operations when a page fault occurred during processing.
- Fixed wrong detection for the on-write page faults.
Optimizations:
- Optimized reflected CRC operation.
v0.1.7-beta
Intel® DML v0.1.6-beta
Date: January 2022
Note: Release introduces initial implementation for the auto execution path, page fault handling, and manual NUMA node selection API
Features:
- Implemented the auto execution path (software fallback) for C API. The library will try to use hardware, but in case it is unavailable there is a software fallback.
- Added page fault handling:
- Removed usage of BlockOnFault flag
- If page fault occurred during descriptor processing:
- For the hardware execution path an erroneous status is returned
- For the auto execution path there is a software fallback, so the remainder of the workload is processed on CPU.
- Added
numa_id
field fordml_job_t
structure to specify custom NUMA node id for submission. Setting a number allows the library to do cross-socket submissions.
Optimizations:
- Optimized CRC operation for short lengths
v0.1.6-beta
Intel® DML v0.1.6-beta
Date: December 2021
Note: Release introduces bug fixes and several minor improvements
Features:
- Improved incorrect input checking
- Added check for adjacent buffers for the DIF Strip operation. Status:
DML_STATUS_DIF_STRIP_ADJACENT_ERROR
- Reworked hardware related statuses for C API
- Added new status to indicate submission failure:
DML_STATUS_WORK_QUEUES_NOT_AVAILABLE
for C APIdml::status_code::queue_busy
for C++ API
- Removed LIBACCEL_3_2 cmake option. The supported version of accel-config is now 3.2 and higher
- NUMA node id is detected before each submission now, so threads are safe to change nodes at any time
Bug fix:
- Fixed the issue when batch operation doesn't work for buffer not aligned on 64 bytes boundary
- Fixed the issue when current thread NUMA node id is deduced incorrectly
- Fixed crashes when there are no available devices for the current thread NUMA node id
- Removed dependencies on C++ runtime from C API
Warnings:
- As NUMA node id of the current thread is now deduced correctly, ensure that accelerators' configuration is compatible. The library does no cross-socket submissions. If there is no available device for the current NUMA node id, then an error status code is reported.
v0.1.5-beta
Intel® DML v0.1.5-beta
Date: November 2021
Note: Release introduces unification of underlying implementation for both C and C++ APIs
Features:
- Added internal device selection logic to C API (the same as for C++ API)
- Selector considers submitting thread's NUMA node id
- Selector switches devices and work queues with each submission
- Improved range checking for C and C++ APIs
Bug fix:
- Lowered memory size requirements for job structure by ~100x.