de-en model needs a lot more memory than de-cs #69

erickrf · 2022-10-18T11:57:40Z

I have been using the de-en and de-cs model on the same dataset (a few hundred thousand texts), and noticed that the English model needs a lot more memory than the Czech one. I'm running on an A100 GPU (40 GB memory).

In practice, I ended up with a batch size for English smaller than half of the Czech batch, even though the model config says they are roughly the same size - the only difference being that actually the de-cs vocabulary is slightly larger.

On top of that, the English model gets the repeating nonsense subsequence issue a lot more often. I approximated that by a type to token ratio below 0.15, which gives 20 texts to Czech and around 70k in English. I don't see how this might relate to memory consumption but maybe there's something.

The text was updated successfully, but these errors were encountered:

jorgtied · 2023-01-12T07:52:46Z

What specific models were you using? Could it be that they have different parameter sizes, vocab sizes or something like that?

erickrf · 2023-01-30T09:53:29Z

I loaded them with the transformers library. For Czech it was Helsinki-NLP/opus-mt-de-cs and for English Helsinki-NLP/opus-mt-de-en.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

de-en model needs a lot more memory than de-cs #69

de-en model needs a lot more memory than de-cs #69

erickrf commented Oct 18, 2022

jorgtied commented Jan 12, 2023

erickrf commented Jan 30, 2023

de-en model needs a lot more memory than de-cs #69

de-en model needs a lot more memory than de-cs #69

Comments

erickrf commented Oct 18, 2022

jorgtied commented Jan 12, 2023

erickrf commented Jan 30, 2023