Resizing token embeddings to account for new special tokens #258

g-karthik · 2021-04-27T00:53:02Z

I've been looking through your project and was wondering, how are you allowing for addition of new special tokens downstream after pre-training? I see some support for HF tokenizers, but newly added special tokens would need to be accounted for by calling the resize_token_embeddings() method in HF. Is there some equivalent to be able to accomplish that here?

@sdtblck @StellaAthena

The text was updated successfully, but these errors were encountered:

sdtblck · 2021-04-27T14:14:28Z

Hey! We currently don't have any way of handling that.

If it's a feature you'd like, feel free to start a PR.

It may be slightly more complicated than the HF method, as the embedding weights are distributed across machines in the model-parallel case, but it would involve resizing the weights here:

gpt-neox/megatron/mpu/layers.py

Line 134 in 9992042

self.weight = Parameter(torch.empty(

g-karthik added the feature request New feature or request label Apr 27, 2021

Quentin-Anthony closed this as completed Sep 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resizing token embeddings to account for new special tokens #258

Resizing token embeddings to account for new special tokens #258

g-karthik commented Apr 27, 2021

sdtblck commented Apr 27, 2021

Resizing token embeddings to account for new special tokens #258

Resizing token embeddings to account for new special tokens #258

Comments

g-karthik commented Apr 27, 2021

sdtblck commented Apr 27, 2021