Implement kv-caching, add more variance of extrapolation (CFG) and interpolation methods #55

honglu2875 · 2023-10-31T17:05:26Z

Implement kv-caching
Add a few CFG variance (varying CFG strength, negative prompt, ...)
Optimizations (added with torch.inference_mode(): around decoding)

loubbrad · 2023-11-01T13:00:18Z

aria/model/model.py

 super().__init__()
- inv_freq = 1.0 / (base ** (torch.arange(0, dim, 2).float() / dim))
+ if device is None: # todo: maybe we don't need this...


Probably best to not explicitly do this. Could instead perhaps do device = self.device? I'm worried that this will break the distributed code (with accelerate) but I'm unable to test it at the moment.

The problem with device=self.device is that somehow device=None defaults to CPU and I will get mismatch device error.

@loubbrad I didn't realize these two lines can be safely removed now with hugging face codes. In fact device param itself is redundant!
In the original implementation of RotaryEmbedding (from neox repo) this device needs to be known at initialization-time because the cached cos_cached, sin_cached need to be explicitly set. But now they are registered as model parameters and moving across device is automatic.

loubbrad · 2023-11-01T13:01:42Z

aria/model/model.py

- self.rotary_emb = RotaryEmbedding(self.d_head)
+ if use_yarn:
+ # todo: need more testing on this
+ self.rotary_emb = DynamicYaRNScaledRotaryEmbedding(self.d_head,


This import is missing I think?

Oh yes you are right. Will fix it in a moment.

honglu2875 added 6 commits October 31, 2023 15:56

update kv cache, decoding method and yarn

371c31c

fix

6e663e7

bug fixes

59bc606

(optionally) applies yarn position embedding

293656c

merge

68bdbc8

minor mistake on formula of alpha

c5d40a3

loubbrad reviewed Nov 1, 2023

View reviewed changes

honglu2875 and others added 4 commits November 1, 2023 21:35

fix import; run black on dynamic_yarn.py; add comments

3779b62

fix minor bug; remove redundancy

13f84e5

remove device, reformat model.py

3b8a581

fix formatting

e3f7fe5

loubbrad merged commit 358bd22 into EleutherAI:dev Nov 3, 2023

honglu2875 deleted the honglu/dev branch November 6, 2023 23:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement kv-caching, add more variance of extrapolation (CFG) and interpolation methods #55

Implement kv-caching, add more variance of extrapolation (CFG) and interpolation methods #55

honglu2875 commented Oct 31, 2023 •

edited

Loading

loubbrad Nov 1, 2023

honglu2875 Nov 1, 2023

honglu2875 Nov 1, 2023

loubbrad Nov 1, 2023 •

edited

Loading

honglu2875 Nov 1, 2023

Implement kv-caching, add more variance of extrapolation (CFG) and interpolation methods #55

Implement kv-caching, add more variance of extrapolation (CFG) and interpolation methods #55

Conversation

honglu2875 commented Oct 31, 2023 • edited Loading

loubbrad Nov 1, 2023

Choose a reason for hiding this comment

honglu2875 Nov 1, 2023

Choose a reason for hiding this comment

honglu2875 Nov 1, 2023

Choose a reason for hiding this comment

loubbrad Nov 1, 2023 • edited Loading

Choose a reason for hiding this comment

honglu2875 Nov 1, 2023

Choose a reason for hiding this comment

honglu2875 commented Oct 31, 2023 •

edited

Loading

loubbrad Nov 1, 2023 •

edited

Loading