MegEngine v1.13.3

Wanwan1996 released this 21 Dec 06:58

MegEngine

HighLight

新增支持寒武纪思元系列 AI 芯片训练和推理。

know issue

dump 开启 CD4 + FP16 时 clip 阶段图优化异常， MIN op 相关 bug 导致 dump 出错，预计在 v1.13.4 修复。

Bug fixes

第三方硬件

修复 rocm 编译失败的问题。
修复在寒武纪 590 上找不到 checksum_kernel_union4 kernel 的问题。

通用组件

修复 trace 模式时 reshape 算子不支持 int64 的 shape 输入的问题。
修复 tile 算子 workspace 计算错误的问题。
修复由于 NHWCD4 优化 pass 处理错误导致 seg transformer 模型无法 dump 的问题。
修复 megfile 版本依赖固定的问题。
修复 module_stats 函数计算 traced_module 模型参数量和计算量报错的问题。
优化了在异步执行出错时的报错信息，提供给用户进一步定位问题的方法。
在 graph 执行出错抛出异常前提供了更多的错误信息。
修复因缺少头文件 limits 而引发的编译错误。

发版流程

修复在不带 MGE_WITH_CUSTOM_OP 编译参数时编译 megbrain cuda 后端不通过的问题。

XLA

修复 xla 显存占用不稳定的问题。
修复 XLA 出现的 indexing 错误。
修复 XLA 无法 Trace GradManager Callback 的问题；修复 XLA 无法 Trace 带有 property 装饰的 module 的问题。

CUDA

暂时关闭了两个调用 cudnn-v8 的算法（AlgoCUDNNConvV8，AlgoCUDNNConvBiasActivationV8）以修复计算结果的对分问题。
修复已知问题，正式支持 cuda11.8。

文档

修复 megengine 中 _mgb.so 丢失的问题。

New Features

Python API

新增 einsum 算子。
增加对 exponential opr 的支持。
增加对多项式分布采样的支持。
增加对 Remap 算子的支持。
增加对 GaussianBlur 算子的支持。

第三方硬件

寒武纪平台支持 neuware 1.13.0 版本。
支持寒武纪平台训练和推理。

通用组件

增加对 dilate 算子的支持。
修复 ohos thread local存在的内存泄漏问题

XLA

xla 后端添加 fake_quant、tqt 算子。
在 xla 中支持 linspace，stack，resize，resize backward 算子。
支持 XLA 后端添加 lsq 算子。

Improvements

Dataloader

将 datamonitor 中统计的 dataset 和 transform 时间修改为一个 batch 的总时间，使其与 collator time 和 ipc time 统计口径保持一致。

MegEngine Lite

Bug Fixes

文档

修复 lite 中 get_elem_size 方法文档描述与实现不一致的问题。

MegEngine

HighLight

Added support for Cambrian MLU series AI chip training and inference.

know issue

When dump turns on CD4 + FP16, the clip phase diagram optimization is abnormal. MIN op related bugs cause dump errors. It is expected to be fixed in the next new version (MegBrian v8.20.4)

Bug fixes

Third-party hardware

Fix the problem of rocm compilation failure.
Fixed an issue where the checksum_kernel_union4 kernel could not be found on Cambrian 590.

Common components

Fixed the bug that the reshape operator does not support int64 shape input in trace mode.
Fixed the problem of incorrect calculation of tile operator workspace.
Fixed the issue where the seg transformer model cannot be dumped due to NHWCD4 optimization pass processing errors.
Fix megfile version dependency fixing problem.
Fix the problem of module_stats function calculating the traced_module model parameters and calculation amount reporting an error.
Optimize the error messages during asynchronous execution errors, providing users with methods to further locate issues.。
Provide more error information before throwing an exception when an error occurs during graph execution.
Fix the compilation error caused by the missing header file "limits".

Release process

Fix the problem that the megbrain cuda backend fails to pass when compiled without the MGE_WITH_CUSTOM_OP compilation parameter.

XLA

Fix the unstable occupation of cuda memory of xla.
Fix indexing problems with XLA.
Fix the problem that XLA cannot trace GradManager Callback.
Fix the problem that XLA cannot trace modules with property decorations.

CUDA

Temporarily closed two algorithms that call cudnn-v8 (AlgoCUDNNConvV8, AlgoCUDNNConvBiasActivationV8) to fix the bisection problem of calculation results.
Formal support for cuda11.8。

Documentation

Fixed loss of mgb.so in megengine.

New Features

Python API

Implements einsum operator.
Add exponential opr.
Added support for polynomial distribution sampling.
Add Remap module.
Add GaussianBlur module.

Third-party hardware

Cambrian platform supports neuware version 1.13.0.
Support Cambricon training and inference.

Common components

Add the dilate operator.
Fix memory leak issues in OHOS thread local storage.

XLA

Add fake quant and tqt operators to the xla backend.
XLA supports linspace, stack, resize, resize backward operators。
The lsq operator is added to the XLA back-end.

Improvements

Dataloader

Modify the dataset and transform time statistics in datamonitor to the total time of a batch to make it consistent with the statistical calibers of collator time and ipc time.

MegEngine Lite

Bug Fixes

Documentation

Fix the inconsistency between the documentation and implementation of the get_elem_size method in lite.

Assets 7