Problem in position embedding #4

jmercat · 2023-08-31T00:36:34Z

Line 129 in 619a8b3

queries, keys, vals = self.pos_embed(queries, keys, vals)

It seems to me that the rotary position embedding is being applied on the head dimension (dim -2) of the vectors q, k instead of the sequence dimension (dim 1).
I think the head and sequence dimensions should be swapped before calling position embedding .
(see https://github.com/facebookresearch/xformers/blob/748c159096d4f9fcfe3eaf22801e5aed4777210b/xformers/components/positional_embedding/rotary.py#L85)

What I'm proposing is simply to re-write RotaryWithCast as follow:

class RotaryWithCast(RotaryEmbedding):
    def forward(self, q, k, v):
        q, k = super().forward(q.permute(0, 2, 1, 3), k.permute(0, 2, 1, 3))
        q = q.permute(0, 2, 1, 3)
        k = k.permute(0, 2, 1, 3)
        return q.to(v.dtype), k.to(v.dtype), v

jmercat · 2023-08-31T17:35:19Z

Here is the runs I made with a custom subset of starcoder data. The original 11m training is in brown. My implementation using a different positional encoding (including the proposed fix) is in orange.

sagadre · 2023-08-31T17:46:07Z

Good catch! The blow up curves your are seeing are similar to the ones we were seeing before we introduced qk norm for the smaller models. Will do some testing with this fix on my end as well. Would you like to open a PR?

mitchellnw · 2023-08-31T17:48:09Z

Wow, amazing catch! We really appreciate this.

mitchellnw · 2023-08-31T18:05:46Z

We've added your name to the README because this is a very substantial bug catch. It's pretty interesting that our first 1B/7B runs do pretty well even without proper posembeds, but we should fix this going forward.

jmercat · 2023-08-31T18:14:37Z

Great code base by the way. It's a pleasure to read.
Thanks for proposing to include me. I could open a PR but it's probably simpler for you to just include what I wrote (or a better version... I haven't tested if calling contiguous would make a difference).

sagadre · 2023-08-31T18:27:34Z

looking into a way to implement this directly with the xformers api. thanks so much @jmercat !

jmercat · 2023-08-31T19:03:27Z

~~actually moving that line before the call to view would be enough.~~

open_lm/open_lm/model.py

Line 129 in 9b3ca53

queries, keys, vals = self.pos_embed(queries, keys, vals)

sagadre · 2023-08-31T19:05:18Z

The problem actually seems to be upstream in xformers. Opened an issue here: facebookresearch/xformers#841

add figure 1

jmercat changed the title ~~Problem in position embedding?~~ Problem in position embedding Aug 31, 2023

sagadre linked a pull request Sep 2, 2023 that will close this issue

Positional embedding xformers fix #5

Merged

sagadre closed this as completed in #5 Sep 3, 2023

sedrickkeh pushed a commit to sedrick-keh-tri/open_lm_fork that referenced this issue May 23, 2024

Merge pull request mlfoundations#4 from TRI-ML/linear_new

6d9d72b

add figure 1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem in position embedding #4

Problem in position embedding #4

jmercat commented Aug 31, 2023 •

edited

Loading

jmercat commented Aug 31, 2023 •

edited

Loading

sagadre commented Aug 31, 2023

mitchellnw commented Aug 31, 2023

mitchellnw commented Aug 31, 2023

jmercat commented Aug 31, 2023

sagadre commented Aug 31, 2023

jmercat commented Aug 31, 2023 •

edited

Loading

sagadre commented Aug 31, 2023

Problem in position embedding #4

Problem in position embedding #4

Comments

jmercat commented Aug 31, 2023 • edited Loading

jmercat commented Aug 31, 2023 • edited Loading

sagadre commented Aug 31, 2023

mitchellnw commented Aug 31, 2023

mitchellnw commented Aug 31, 2023

jmercat commented Aug 31, 2023

sagadre commented Aug 31, 2023

jmercat commented Aug 31, 2023 • edited Loading

sagadre commented Aug 31, 2023

jmercat commented Aug 31, 2023 •

edited

Loading

jmercat commented Aug 31, 2023 •

edited

Loading

jmercat commented Aug 31, 2023 •

edited

Loading