Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About XLNet Parameters #6

Open
woshierniu opened this issue Dec 12, 2021 · 8 comments
Open

About XLNet Parameters #6

woshierniu opened this issue Dec 12, 2021 · 8 comments

Comments

@woshierniu
Copy link

woshierniu commented Dec 12, 2021

Hello, thank you very much for sharing your codes and idea.
①I encountered some problems when I was only running XLNet (Python train. Py) for NER, as follows, can you tell me why ?

Traceback (most recent call last):
File "C:/Users/94312/Desktop/ner-combining-contextual-and-global-features-master/xlnet-ner/train.py", line 201, in
torch.save(model, f"{fname}.pt")
File "C:\Users\94312\Desktop\NER-pytorch-master\venv\lib\site-packages\torch\serialization.py", line 379, in save
_save(obj, opened_zipfile, pickle_module, pickle_protocol)
File "C:\Users\94312\Desktop\NER-pytorch-master\venv\lib\site-packages\torch\serialization.py", line 484, in _save
pickler.dump(obj)
TypeError: can't pickle torch._C.ScriptFunction objects

②And can you tell me the optimal parameters when only running XLNet ? Is it consistent with the parameters in the code xlnet-ner/train.py? Is ''finetuning == true'' ?

parser. add_ argument("--batch_size", type=int, default=128)
parser. add_ argument("--lr", type=float, default=0.0001)
parser. add_ argument("--n_epochs", type=int, default=30)

Looking forward to your reply, thank you.

@woshierniu woshierniu changed the title About XLNet Run Error About XLNet Parameters Dec 12, 2021
@honghanhh
Copy link
Owner

honghanhh commented Dec 12, 2021

Hi @woshierniu, thank you for your interest in NER using XLNet in combination with GCNs.

① Is the issue the same with the combined version or only with XLNet? It works fine from my side so I will double-check about that.

② The optimal parameters I applied when I implement XLNet:

  • Batch size: 16
  • Learning rate: 1e-5
  • Dropout: 0.2

I hope that it would help. Thanks!

@woshierniu
Copy link
Author

Hi @woshierniu, thank you for your interest in NER using XLNet in combination with GCNs.

① Is the issue the same with the combined version or only with XLNet? It works fine from my side so I will double-check about that.

② The optimal parameters I applied when I implement XLNet:

  • Batch size: 16
  • Learning rate: 1e-5
  • Dropout: 0.2

I hope that it would help. Thanks!

Hi , @honghanhh , Thank you very much for your reply. I have solved the first run error. I will try the second parameter according to what you provide.

① Can you tell me is ''finetuning == true'' when only training XLNet ? And when I only want to use XLNet for NER, do I only need to run ''train.py'' ?
② the NER result mentioned in your paper is fuzzy matching, but the latest research progress result are generally exact matching. Do you record the exact matching results of NER made by XLNet combined with GCN ?

Looking forward to your reply, thank you !

@honghanhh
Copy link
Owner

Hi @woshierniu, sorry for my late reply.

① Yes, you can use train.py to experiment only on XLNet with my suggested optimal parameters.
② The final result is the combination between optimal hyperparameters as below:
XLNet:

  • Batch size: 16
  • Learning rate: 1e-5
  • Dropout: 0.2
    GCN:
  • Batch size: 8
  • Learning rate: 3e-4

I hope that it would help!

@woshierniu
Copy link
Author

Hi @woshierniu, sorry for my late reply.

① Yes, you can use train.py to experiment only on XLNet with my suggested optimal parameters. ② The final result is the combination between optimal hyperparameters as below: XLNet:

  • Batch size: 16
  • Learning rate: 1e-5
  • Dropout: 0.2
    GCN:
  • Batch size: 8
  • Learning rate: 3e-4

I hope that it would help!

Thank you very much for your patient reply. I will try your best parameters, which will be very helpful to me!

@woshierniu
Copy link
Author

Hi @woshierniu, sorry for my late reply.

① Yes, you can use train.py to experiment only on XLNet with my suggested optimal parameters. ② The final result is the combination between optimal hyperparameters as below: XLNet:

  • Batch size: 16
  • Learning rate: 1e-5
  • Dropout: 0.2
    GCN:
  • Batch size: 8
  • Learning rate: 3e-4

I hope that it would help!

Sorry, I have another question. Does xlnet use for fine-tuning or just extract features as embeddings?

@honghanhh
Copy link
Owner

honghanhh commented Dec 23, 2021

Hi @woshierniu,

  • For the proposed model, I used Xlnet as contextual embeddings in combination with global embedding from GCN.
  • For XLNet standalone, I used XLNet for fine-tuning with an additional Linear layer at the end.

Please check out my paper for the details in the architecture and how each model works at
https://arxiv.org/abs/2112.08033

Please feel free to contact me if you have any concerns. Happy Xmas!

@woshierniu
Copy link
Author

woshierniu commented Dec 23, 2021

Hi @woshierniu,

  • For the proposed model, I used Xlnet as contextual embeddings in combination with global embedding from GCN.
  • For XLNet standalone, I used XLNet for fine-tuning with an additional Linear layer at the end.

Please check out my paper for the details in the architecture and how each model works at https://arxiv.org/abs/2112.08033

Please feel free to contact me if you have any concerns. Happy Xmas!

Happy Xmas! @honghanhh
① Thank you for your timely reply. I read your paper. You mentioned that you used relaxed matching in your paper. I still have some questions about this. The evaluation standard used in the latest NER Research (BERT / flair, etc.) is relaxed matching or the exact matching using evaluation tools?

② The language model of xlnet / Bert in the code, does the initial input not need word vectors such as Glove?

@honghanhh
Copy link
Owner

Hi @woshierniu,

  • For the proposed model, I used Xlnet as contextual embeddings in combination with global embedding from GCN.
  • For XLNet standalone, I used XLNet for fine-tuning with an additional Linear layer at the end.

Please check out my paper for the details in the architecture and how each model works at https://arxiv.org/abs/2112.08033
Please feel free to contact me if you have any concerns. Happy Xmas!

Happy Xmas! @honghanhh ① Thank you for your timely reply. I read your paper. You mentioned that you used relaxed matching in your paper. I still have some questions about this. The evaluation standard used in the latest NER Research (BERT / flair, etc.) is the relaxed matching or exact matching using evaluation tools?

② The language model of xlnet / Bert in the code, does the initial input not need word vectors such as Glove?

  1. Sorry for my late reply. We used relaxed matching to better compare with other SOTA papers which share the same evaluation approaches at the moment we publish our paper.
  2. XLNet uses SentencePiece to construct tokenizer. I believe you can customize with Glove but maybe the results are not as good as the default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants