Completely refactor and test YaRN finetuning #78

honglu2875 · 2023-12-09T15:14:15Z

Unified dynamic YaRN and finetuning YaRN.
- ~~The code and the config can still be improved. Still WIP~~ Refactored the codes.
- One successful finetuning run.
- YaRN hyperparameter sweep.
Improved RoPE using the in-place RoPE update by Katherine.
Change to in-place KV-cache and overhauling use_cache parameter.
Removed constraints by model.max_seq_len because dynamic yarn can go a little longer in principle.

…can go a little further)

loubbrad · 2023-12-13T20:09:04Z

aria/model/model.py

- else:
- self.rotary_emb = RotaryEmbedding(self.d_head)
+ cfg = model_config.yarn_config or YaRNConfig()
+ self.rotary_emb = YaRNScaledRotaryEmbedding(


Are YaRN rotary embeddings always being used? What about when we are pre-training? Does the default YaRNConfig() encapsulate the situation where YaRNScaledRotaryEmbedding is the same as normal rotary embeddings?

YaRN embedding is always used. The default is equivalent to the usual RoPE (scale=1.0)

honglu2875 added 30 commits December 7, 2023 23:37

add yarn finetuning

750b1cd

fixing type hints

cd78947

yarn refactor

585029a

fix bugs

7d2fc9d

bug and format

8b543c3

bug fix

d16b76a

in-place RoPE; major fix for YaRN

c7dcaaf

format

fe0fca6

clean up; remove redundant because we stick to NeoX style.

4885a74

dropping too long error because we do YaRN

e43cd94

update config

5678991

fix yarn

99ee8a5

fix yarn; change some defaults

099816a

revamp kv cache

b604703

fixing sample function; removing max length constraint (dynamic yarn …

11c2801

…can go a little further)

format

67ca0eb

minor bug

8a969c5

fixing batch size

845d0bf

fix cache device

c066ede

fix attention mask

e644579

bug fix

cee39ed

bug fix

6dd3ae7

revamp sampling code; refactor kv cache

feba1b7

format

a8a30ee

fix mask

f9b20c4

fix bugs

88d7626

format

69c21b1

fix typo on mscale default and dynamic scaling

ab58b81

Merge branch 'main' into dev

34cc86e

format

262466c

honglu2875 marked this pull request as ready for review December 13, 2023 18:38

loubbrad reviewed Dec 13, 2023

View reviewed changes

loubbrad approved these changes Dec 13, 2023

View reviewed changes

loubbrad merged commit 8205d85 into EleutherAI:main Dec 13, 2023
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Completely refactor and test YaRN finetuning #78

Completely refactor and test YaRN finetuning #78

honglu2875 commented Dec 9, 2023 •

edited

Loading

loubbrad Dec 13, 2023

honglu2875 Dec 13, 2023

Completely refactor and test YaRN finetuning #78

Completely refactor and test YaRN finetuning #78

Conversation

honglu2875 commented Dec 9, 2023 • edited Loading

loubbrad Dec 13, 2023

Choose a reason for hiding this comment

honglu2875 Dec 13, 2023

Choose a reason for hiding this comment

honglu2875 commented Dec 9, 2023 •

edited

Loading