Skip to content

Releases: intel/DML

Intel DML v1.2.0

26 Sep 23:04
f59ed47
Compare
Choose a tag to compare

Functionality

  • Introduced a new internal submission mechanism for platforms based on Linux* OS kernel versions where MMAP is no longer permitted. For more details, refer to the Intel Security Advisory. When MMAP is unavailable, the write system call is used instead. This may introduce additional overhead for smaller data sizes (smaller than 16KB), that results in slightly higher Latency and lower Throughput.
  • Updated the DML device search mechanism to a new default behavior. Now, the platforms with Sub-NUMA clustering configured such that not all NUMA nodes have an accelerator instance can utilize any DSA instance from the same socket for execution. If more fine-grained control is needed, the Low-Level API of the library provides the ability to select devices from a specific NUMA node using the numa_id field in the job structure.
  • Introduced a new Low-Level API function dml_batch_get_crc() which retrieves the resulting CRC from a CRC operation.

Usability and Documentation

  • Extended examples to use new operation dml_batch_get_crc() and also to clarify use of crc seed for CRC operation.

Known Limitations

  • Intel(R) DML could be built from directly downloadable files (.tar, .tgz) only without tests and benchmark frameworks, using the -DDML_BUILD_TESTS=OFF build option since they require submodules that are not included in the archives by GitHub* during release creation.
  • Delta Record operations are not currently supported on the hardware_path.
  • Batch operation is currently not supported for the platforms based on Linux* OS kernel versions where MMAP is not permitted.
  • Known test failures are listed below:
    • block_on_fault/apply_delta_page_fault.read/1

Intel DML v1.1.2

04 Apr 21:19
8224bea
Compare
Choose a tag to compare

This is a patch release containing the following change to v1.1.1:

Bug Fixes

  • Fixed possible "_FORTIFY_SOURCE redefined" build warning/error. Some GCC* builds could internally set _FORTIFY_SOURCE and that could have resulted into DML build error.

Known Issues / Limitations

  • Intel DML can be built from directly downloadable files (.tar, .tgz) only without tests and benchmark frameworks, using the
    -DDML_BUILD_TESTS=OFF build option because they require submodules that are not included in the archives by GitHub during release creation.
  • Known test failures are listed below. Some tests fail only under certain conditions, which are noted in parentheses.
    • (hardware_path, auto_path) block_on_fault/apply_delta_page_fault.read/1.
    • (hardware_path on DSA 2.0) dml_drain.ta_default_parameters test could hang when DSA 2.0 is used.
  • There is an issue on the auto_path for continuation after page fault if the page fault occurred on a pattern boundary (for fill or compare_pattern operations). Where part of the pattern is used before page fault, and the pattern is restarted from the beginning after page fault.

Intel DML v1.1.1

23 Oct 17:35
e0c2d3d
Compare
Choose a tag to compare

This is a patch release containing the following changes to v1.1.0:

Usability and Documentation

Bug Fixes

  • Fixed incorrect Page Fault handling on automatic path.
  • Fixed warning when building with Clang and -Wstrict-prototypes.
  • Added missing <stdexcept> header that caused a build failure with Clang compiler.
  • Fixed incorrect job finalization in Low-Level API multi socket example.
  • Fixed various issues flagged by the static code analysis tool.
  • Fixed outdated link in README file.

Known Issues / Limitations

  • Intel DML can be built from directly downloadable files (.tar, .tgz) only without tests and benchmark frameworks, using the
    -DDML_BUILD_TESTS=OFF build option, because they require submodules that are not included in the archives by GitHub during release creation.
  • Known test failures are listed below. Some tests fail only under certain conditions, which are noted in parentheses.
    • (hw/auto)block_on_fault/apply_delta_page_fault.read/1
  • There is an issue on the auto path for continuation after pagefault, if the pagefault occured on a pattern boundary (for fill, compare_pattern operations). Where part of the pattern is used before pagefault, and the pattern is restarted from the beginning after pagefault.

Thanks to the Contributors

Release includes contributions from the project team as well as @haiyuewa.

Intel DML v1.1.0

19 Jul 17:34
a7c183e
Compare
Choose a tag to compare

Functionality

  • Introduced Block on Fault support for High-Level and Low-Level APIs.
  • Added Initial support for Intel(R) Data Streaming Accelerator 2.0.
  • Added Clang* compiler support for Build and Testing.

Usability and Documentation

  • Clarified the NUMA* support in the Quick Start section of Documentation.
  • Updated the Installed package structure to comply with the Linux* OS file-system hierarchy.
  • Extended returned status codes in case of queue submission errors for more accessible issues reporting.
  • Updated GoogleTest* and Google* Benchmarks submodules to the latest released version.
  • Reworked Low-Level API examples and added an option to select the execution path.
  • Added a warning into the Documentation about the handler lifetime and usage.

Deprecations

  • Deprecated .dont_invalidate_cache() method (High-Level API) and DML_FLAG_DONT_INVALIDATE_CACHE (Low-Level API) for Cache Flush operation.

Issues Fixed

  • Fixed various build issues with GCC* 12.
  • Fixed Create Delta Record on an automatic path for the case when Page Faults happened. Previously, in partial completion, Create Delta Record was not updated before re-submitting to the software path.
  • Fixed asynchronous execution using the automatic path to handle partial completion due to page fault correctly.

Known Limitations

  • Intel(R) DML could be built from directly downloadable files (.tar, .tgz) only without tests and benchmark frameworks, using the -DDML_BUILD_TESTS=OFF build option since they require submodules that are not included in the archives by GitHub* during release creation.
  • Known test failures are listed below:
    • block_on_fault/apply_delta_page_fault.read/1

Intel DML v1.0.0

30 Mar 16:24
6086665
Compare
Choose a tag to compare

Functionality

  • Added Benchmark Framework with limited support. Refer to the Benchmark Framework Guide in the documentation for details regarding what is supported and how it can be used.
  • Added no-operation (no-op) support to High-Level API that can be used in Batch operation as Fence.
  • Added support of umonitor/umwait to Low-Level Job API (refer to dml_wait_mode_t enum).
  • Added DML_MIN_BATCH_SIZE macro to expose the minimum required batch size.
  • Added more status codes reporting for Low-Level Job API to allow reporting of all Intel DSA statuses.
     

Usability and Documentation

  • Removed limitation that libaccel-config.so.1 must be placed in /usr/lib64/ to execute an application that uses Intel(R) DML. Now user can specify its location using LD_LIBRARY_PATH environment variable.
  • Introduced the -DDML_BUILD_{TESTS, EXAMPLES} option (by default, is ON). -DDML_BUILD_TESTS=OFF enables you to build the library without testing from directly downloadable files (.tar, .tgz).
  • Improved High-Level API examples by setting the execution path based on a command-line argument instead of hardcoding to use the Software Path.
  • Restructured documentation and introduced general improvements and updates.

Deprecated Functionality

  • Removed dml_get_limits(...) service function.
  • Removed EFFICIENT_WAIT build option. Now user should set DML_WAIT_MODE_UMWAIT (refer to dml_wait_mode_t enum) when using Low-Level Job API in order to enable umonitor/umwait.

Breaking Changes

  • Changed API for Low-Level API dml_execute_job(...) and dml_wait_job(...) to include dml_wait_mode_t wait_mode parameter.

Bug Fixes

  • Fixed GCC* 11 build failures caused by missing headers.
  • Fixed incorrect queue submission mechanism that might have led to segmentation fault with previous DML releases.

Known issues/limitations

  • Intel DML could be built from downloadable files directly (.tar, .tgz) only without tests and benchmark frameworks using the -DDML_BUILD_TESTS=OFF build option. They require submodules that are not included in the archives on GitHub* during release creation.
  • Known test failures for Hardware Path are listed below.
    • dml_cache_flush.ta_do_not_invalidate
    • dmlhl_cache_flush/{2, 3}.dont_invalidate
    • transfer_size/cache_flush.success/{1, 3, 5, 7, 9, 11, 13}
    • alignment/cache_flush.success/{1, 3, 5, 7}
    • create_delta.page_fault_{read_first, read_second, write}

v0.1.9-beta

21 Mar 17:29
Compare
Choose a tag to compare

Intel® DML v0.1.9-beta

Date: March 2022

Note: Release introduces a test system for the library.

Features:

  • Added tests for the library under the tests/ folder
  • Added example for multi-socket utilization of the library in the Code Samples and Examples section

v0.1.8-beta

22 Feb 17:57
Compare
Choose a tag to compare

Intel® DML v0.1.8-beta

Date: February 2022

Note: Release introduces the auto execution path and manual NUMA selection for C++ API as well as several page fault handling bugfixes.

Features:

  • Implemented the auto execution path (software fallback) for C++ API. The library tries to use hardware, but in case it is unavailable, there is a software fallback.
  • Added numa_id parameter for dml::execute and dml::submit functions to specify custom NUMA node id for submission. Setting a number allows the library to do cross-socket submissions.
  • Removed DML_HW cmake option. The library is built with HW support by default.
  • Added dynamic optimization dispatcher. The library checks if a necessary instruction set is supported on the system at runtime.

Bug fix:

  • Fixed erroneous results for Compare operations when a page fault occurred during processing.
  • Fixed wrong detection for the on-write page faults.

Optimizations:

  • Optimized reflected CRC operation.

v0.1.7-beta

25 Jan 20:47
Compare
Choose a tag to compare

Intel® DML v0.1.6-beta

Date: January 2022

Note: Release introduces initial implementation for the auto execution path, page fault handling, and manual NUMA node selection API

Features:

  • Implemented the auto execution path (software fallback) for C API. The library will try to use hardware, but in case it is unavailable there is a software fallback.
  • Added page fault handling:
    • Removed usage of BlockOnFault flag
    • If page fault occurred during descriptor processing:
      • For the hardware execution path an erroneous status is returned
      • For the auto execution path there is a software fallback, so the remainder of the workload is processed on CPU.
  • Added numa_id field for dml_job_t structure to specify custom NUMA node id for submission. Setting a number allows the library to do cross-socket submissions.

Optimizations:

  • Optimized CRC operation for short lengths

v0.1.6-beta

10 Dec 13:47
Compare
Choose a tag to compare

Intel® DML v0.1.6-beta

Date: December 2021

Note: Release introduces bug fixes and several minor improvements

Features:

  • Improved incorrect input checking
  • Added check for adjacent buffers for the DIF Strip operation. Status: DML_STATUS_DIF_STRIP_ADJACENT_ERROR
  • Reworked hardware related statuses for C API
  • Added new status to indicate submission failure:
    • DML_STATUS_WORK_QUEUES_NOT_AVAILABLE for C API
    • dml::status_code::queue_busy for C++ API
  • Removed LIBACCEL_3_2 cmake option. The supported version of accel-config is now 3.2 and higher
  • NUMA node id is detected before each submission now, so threads are safe to change nodes at any time

Bug fix:

  • Fixed the issue when batch operation doesn't work for buffer not aligned on 64 bytes boundary
  • Fixed the issue when current thread NUMA node id is deduced incorrectly
  • Fixed crashes when there are no available devices for the current thread NUMA node id
  • Removed dependencies on C++ runtime from C API

Warnings:

  • As NUMA node id of the current thread is now deduced correctly, ensure that accelerators' configuration is compatible. The library does no cross-socket submissions. If there is no available device for the current NUMA node id, then an error status code is reported.

v0.1.5-beta

23 Nov 10:20
Compare
Choose a tag to compare

Intel® DML v0.1.5-beta

Date: November 2021

Note: Release introduces unification of underlying implementation for both C and C++ APIs

Features:

  • Added internal device selection logic to C API (the same as for C++ API)
    • Selector considers submitting thread's NUMA node id
    • Selector switches devices and work queues with each submission
  • Improved range checking for C and C++ APIs

Bug fix:

  • Lowered memory size requirements for job structure by ~100x.