What if you use only D_{exp} #84

yyrkoon27 · 2023-11-14T16:57:20Z

Dear authors,

I wonder what would happen if you use only D_{exp}. It's not clear whether the ability to differentiate high-quality data from low-quality data is more critical than focusing on fine-tuning toward GPT-4. Please see if you can shed some light on this. Thank you!

--yyrkoon

imoneoi · 2023-11-15T04:29:17Z

You can check out the AlpacaEval leaderboard for preliminary results. OpenChat-V2-W-13B is a C-RLFT model with both sets, while OpenChat-13B is an SFT model with only D_exp. We will update our paper with results for D_exp only soon.

yyrkoon27 · 2023-11-16T01:14:55Z

Thank you for the update!

Since now OpenChat V3.1 13B is surpassing OpenChat-V2-W-13B, I wonder if OpenChat V3.1 13B is a C-RLFT model or an SFT model with only D_{exp}? If it is the latter case, perhaps you will have OpenChat V3.1-W 13B soon?
Is there a C-RLFT model with only D_{exp}? Based on Sec. 5.5 and Fig. 7, I would guess that the performance will keep downgrading if we keep decreasing the data size of GPT-3.5. However I am not sure if it is really the case when we completely exclude GPT-3.5 when doing C-RLFT.

Thank you very much!

imoneoi · 2023-11-17T15:38:33Z

Hi, thanks for your question! Here is the full result of SFT only using D_exp (GPT-4) and D_sub (GPT-3.5) along with the C-RLFT and SFT results with both sets.

Type	Average	AlpacaEval	MT-bench	Vicuna-bench
C-RLFT (GPT-4 + GPT-3.5)	77.3	89.5	57.5	85.0
SFT (GPT-4)	64.5	85.8	33.4	84.4
SFT (GPT-4 + GPT-3.5)	52.7	78.6	33.1	46.3
SFT (GPT-3.5)	42.8	76.5	16.9	35.0

OpenChat V3.1 13B is a C-RLFT model based on Llama 2, with the same configuration as OpenChat-V2-W-13B (this one is based on Llama 1)
See the table above. Yes, it'll downgrade a lot when completely excluding the GPT-3.5 data. C-RLFT will learn and improve from both sets, which is a distinctive feature compared to SFT.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What if you use only D_{exp} #84

What if you use only D_{exp} #84

yyrkoon27 commented Nov 14, 2023

imoneoi commented Nov 15, 2023

yyrkoon27 commented Nov 16, 2023 •

edited

Loading

imoneoi commented Nov 17, 2023

What if you use only D_{exp} #84

What if you use only D_{exp} #84

Comments

yyrkoon27 commented Nov 14, 2023

imoneoi commented Nov 15, 2023

yyrkoon27 commented Nov 16, 2023 • edited Loading

imoneoi commented Nov 17, 2023

yyrkoon27 commented Nov 16, 2023 •

edited

Loading