Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What if you use only D_{exp} #84

Open
yyrkoon27 opened this issue Nov 14, 2023 · 3 comments
Open

What if you use only D_{exp} #84

yyrkoon27 opened this issue Nov 14, 2023 · 3 comments

Comments

@yyrkoon27
Copy link

Dear authors,

I wonder what would happen if you use only D_{exp}. It's not clear whether the ability to differentiate high-quality data from low-quality data is more critical than focusing on fine-tuning toward GPT-4. Please see if you can shed some light on this. Thank you!

--yyrkoon

@imoneoi
Copy link
Owner

imoneoi commented Nov 15, 2023

You can check out the AlpacaEval leaderboard for preliminary results. OpenChat-V2-W-13B is a C-RLFT model with both sets, while OpenChat-13B is an SFT model with only D_exp. We will update our paper with results for D_exp only soon.

@yyrkoon27
Copy link
Author

yyrkoon27 commented Nov 16, 2023

Thank you for the update!

  1. Since now OpenChat V3.1 13B is surpassing OpenChat-V2-W-13B, I wonder if OpenChat V3.1 13B is a C-RLFT model or an SFT model with only D_{exp}? If it is the latter case, perhaps you will have OpenChat V3.1-W 13B soon?

  2. Is there a C-RLFT model with only D_{exp}? Based on Sec. 5.5 and Fig. 7, I would guess that the performance will keep downgrading if we keep decreasing the data size of GPT-3.5. However I am not sure if it is really the case when we completely exclude GPT-3.5 when doing C-RLFT.

Thank you very much!

@imoneoi
Copy link
Owner

imoneoi commented Nov 17, 2023

Hi, thanks for your question! Here is the full result of SFT only using D_exp (GPT-4) and D_sub (GPT-3.5) along with the C-RLFT and SFT results with both sets.

Type Average AlpacaEval MT-bench Vicuna-bench
C-RLFT (GPT-4 + GPT-3.5) 77.3 89.5 57.5 85.0
SFT (GPT-4) 64.5 85.8 33.4 84.4
SFT (GPT-4 + GPT-3.5) 52.7 78.6 33.1 46.3
SFT (GPT-3.5) 42.8 76.5 16.9 35.0
  1. OpenChat V3.1 13B is a C-RLFT model based on Llama 2, with the same configuration as OpenChat-V2-W-13B (this one is based on Llama 1)
  2. See the table above. Yes, it'll downgrade a lot when completely excluding the GPT-3.5 data. C-RLFT will learn and improve from both sets, which is a distinctive feature compared to SFT.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants