-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Qwen1.5 1b8和Qwen2 7b推理到最后出现重复性回答 #35
Comments
可以使用penalty_sample来进行采样 或者是将惩罚系数调高一些,这个可以用 如果两种方式都解决不了,可能是因为量化的缘故导致模型性能下降,那就只能用int8或者fp16/bf16了 另外我看你的速度很快,估计是比较小的模型,一般来说这种重复是小模型比较常见的,可以使用7B这样的规格尝试一下 |
感谢您的回复!!换了采样方式确实有改善,但是在7B的模型中,我发现当我开启第二轮对话的时候,都会出现这种情况, *第一轮对话 Question: 介绍一下九江 Answer: *** bmruntime trace: *** 开始了检测,然后检测完了,再跑Pipeline的命令,就会有关于kernal相关报错,重启就好,但是重启完又只能回答一次,这是什么原因呢? |
这个就有点非常难搞了,这个错误挺麻烦的 :(
也可以参考这里https://github.com/sophgo/LLM-TPU/blob/main/docs/FAQ.md |
soc环境
transformers:4.42.4
torch:2.3.1
LLM-TPU:9a744f0/latest 2024.07.23
driver版本:0.5.1
linaro@bm1684:/usr/lib/cmake/libsophon$ bm_version
SophonSDK version: v24.04.01
sophon-soc-libsophon : 0.5.1
sophon-mw-soc-sophon-ffmpeg : 0.10.0
sophon-mw-soc-sophon-opencv : 0.10.0
BL2 v2.7(release):7b2c33d Built : 16:02:07, Jun 24 2024
BL31 v2.7(release):7b2c33d Built : 16:02:07, Jun 24 2024
U-Boot 2022.10 7b2c33d (Jun 24 2024 - 16:01:43 +0800) Sophon BM1684X
KernelVersion : Linux bm1684 5.4.217-bm1684-g27254622663c #1 SMP Mon Jun 24 16:02:21 CST 2024 aarch64 aarch64 aarch64 GNU/Linux
HWVersion: 0x00
MCUVersion: 0x01
偶尔也会有正常的回答。只不过经常这样。
The text was updated successfully, but these errors were encountered: