Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gibberish text generation after converting to Huggingface. #712

Closed
kanwatchara-k opened this issue Oct 31, 2022 · 8 comments
Closed

Gibberish text generation after converting to Huggingface. #712

kanwatchara-k opened this issue Oct 31, 2022 · 8 comments

Comments

@kanwatchara-k
Copy link

Hi,
I am having trouble converting my checkpoints to Huggingface format. The model works fine when using Deepspeed + Megatron (example).

generate_samples_from_prompt(neox_args, model, ['สวัสดีครับ' for i in range(1)],temperature=0.9, top_k=40)
>>  {'context': 'สวัสดีครับ', ## means 'hello'
  'text': 'เพื่อนๆครับ มีเพื่อนๆ คนไหนที่ทำงานแล้ว หรือกำลังทำงานแล้ว แล้วได้ลาออกจากงานไปแล้วแต่ยังหางานอยู่บ้างครับ พอดีอยากทราบวิธีหางานหรือแนะนำบริษัท ที่ให้เงินเดือนดี และน่าเชื่อถือหน่อยครับ'}

However, it becomes gibberish when converted into Huggingface format (example).

pipe = TextGenerationPipeline(model, tok, device=0)
pipe("สวัสดีครับ",max_new_tokens=50, top_k=40,do_sample=True, temperature=0.9)
>>  [{'generated_text': 'สวัสดีครับ ค.. ดี. แรง 1-". และทำ<|endoftext|>'}]

I have tried multiple conversion scripts so far (e.g., this and this) without success.

All the related files (weights, config, and tokenizer) are in my google drive.

Any help is greatly appreciated!

@StellaAthena
Copy link
Member

@haileyschoelkopf

@haileyschoelkopf
Copy link
Contributor

Hey! Looking into this to see if it's the case on my end!

@haileyschoelkopf
Copy link
Contributor

Oh, @kanwatchara-k would you be willing to send what the exact command you ran to run https://github.com/EleutherAI/gpt-neox/pull/701/files#diff-fff7e2d700e82c3e6027c575c1cd96830ba839ff44fa6b82abf2cb21b029d55c was? There’s a chance that your discrepancy is due to my current script not accepting multiple config files like you used for training.

@kanwatchara-k
Copy link
Author

@haileyschoelkopf Of course. Though I did make some changes to the code (the modified version is here). To be specific, I hard-coded the vocab file path and the tokenizer type. I also combined the two config files in the code (with the paths also hard-coded).

With the modified code, I just ran the command
python tools/convert_to_hf.py --input_dir checkpoints/global_step300000/ --output_dir ./

Thanks

@haileyschoelkopf
Copy link
Contributor

Thank you!! I’ll try this to convert your checkpoint as soon as I can, hopefully later today or early tomorrow!

@haileyschoelkopf
Copy link
Contributor

Still working on finding the possible issue here--I'll keep you posted!

@haileyschoelkopf
Copy link
Contributor

haileyschoelkopf commented Nov 15, 2022

@kanwatchara-k so sorry for the delay on my end. What fixed the issue on my end, where I had a model that also had this problem, was:

  1. pip install --upgrade transformers to the latest version, to include this PR: Add a use_parallel_residual argument to control the residual computing way huggingface/transformers#18695
  2. changing config.json in the HF model to have use_parallel_residual: false

The issue here was that your and my models which wouldn't convert properly use a layer setup different from GPT-NeoX-20b, which is controlled by gpt_j_residual: true in the 20b config file. setting this value in the HF config allows the model to run in the same way it was trained. Hope this helps and works for you! I'll update my conversion script to take this into account.

@kanwatchara-k
Copy link
Author

@haileyschoelkopf Thank you so much! It works properly now!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants