-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DIPU] _amp_update_scale_算子未对dim=0的tensor做判断处理 #535
Labels
DIPU
DIPU related
Comments
Merged
NeosZhang
pushed a commit
to DeepLink-org/deeplink.framework.dev
that referenced
this issue
Jan 18, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
背景
export DIPU_MOCK_CUDA=True
在运行llama_finetune时遇到_amp_update_scale_算子会出现报错。
问题描述
在export DIPU_MOCK_CUDA=True的情况下执行以下代码:
`import torch
import torch_dipu
from torch import tensor
_scale = tensor(65536., device='cuda')
found_inf_combined = tensor(0., device='cuda')
_growth_tracker = tensor(0, device='cuda', dtype=torch.int32)
_growth_factor = 2.0
_backoff_factor = 0.5
_growth_interval = 2000
torch.amp_update_scale(_scale, _growth_tracker, found_inf_combined, _growth_factor, _backoff_factor, _growth_interval)`
会出现错误:
![企业微信截图_17026103414188](https://private-user-images.githubusercontent.com/87467364/290756746-4f5bd7e5-9b38-4008-834e-9142e213d588.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjA2MTM4MDAsIm5iZiI6MTcyMDYxMzUwMCwicGF0aCI6Ii84NzQ2NzM2NC8yOTA3NTY3NDYtNGY1YmQ3ZTUtOWIzOC00MDA4LTgzNGUtOTE0MmUyMTNkNTg4LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MTAlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzEwVDEyMTE0MFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWJhNzVlNGM0YzljNmRlZDgxYmM5ZjYyODMyNTU2MTYyZWQ4NjE0OTkzZjYyODYyOTdlYmRhOGQ2M2MyZTljMWQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.bcrtNeKTz5FIyC3ZR2uo8A-KIbDL_VmNWztHesA1gDY)
初步判断是这里的逻辑没有对dim=0的输入tensor做处理:
https://github.com/DeepLink-org/deeplink.framework/blob/16e155d65f2a5e56d703b3e6acf3d9036b5acb1b/dipu/torch_dipu/csrc_dipu/aten/ops/CustomFallbackFunctionsForAmpGradScaler.cpp#L74C1-L103C2
The text was updated successfully, but these errors were encountered: