Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Qwen-Chat-RLHF和Qwen-Chat的区别 #1310

Open
Tramac opened this issue Aug 8, 2024 · 0 comments
Open

Qwen-Chat-RLHF和Qwen-Chat的区别 #1310

Tramac opened this issue Aug 8, 2024 · 0 comments

Comments

@Tramac
Copy link

Tramac commented Aug 8, 2024

We then use SFT and RLHF to align QWEN to human preference and thus we have QWEN-CHAT and specifically its improved version QWEN-CHAT-RLHF.

技术报告里有提到QWEN-CHAT-RLHF,但在 huggingface 和 modelscope 上都没有看到 RLHF 相关的模型,我理解 QWEN-CHAT 模型应该包括了 RLHF 阶段的训练,那么技术报告中提到的 Qwen-Chat-RLHF 和 Qwen-Chat 的区别是什么?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant