New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Qwen-Chat-RLHF和Qwen-Chat的区别 #1310

Open

Tramac opened this issue Aug 8, 2024 · 0 comments

Tramac commented Aug 8, 2024

We then use SFT and RLHF to align QWEN to human preference and thus we have QWEN-CHAT and specifically its improved version QWEN-CHAT-RLHF.

技术报告里有提到QWEN-CHAT-RLHF，但在 huggingface 和 modelscope 上都没有看到 RLHF 相关的模型，我理解 QWEN-CHAT 模型应该包括了 RLHF 阶段的训练，那么技术报告中提到的 Qwen-Chat-RLHF 和 Qwen-Chat 的区别是什么？

The text was updated successfully, but these errors were encountered:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment