-
Notifications
You must be signed in to change notification settings - Fork 254
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
add best relative positional encoding for AST
- Loading branch information
1 parent
598aa34
commit 6b63067
Showing
3 changed files
with
80 additions
and
31 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
6b63067
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lucidrains Upgrading to this version i always get memory allocation error. Even for
batch size = 2
This is my training code:
6b63067
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having the same issue as @ukemamaster:
Training the Mulan model works fine for me, but it's training the Semantic model that breaks.
I've also tried running it with smaller hyperparams (replace all 1024 -> 256, 512 -> 128, depth 6 -> depth 1). I get the following error instead:
It's specifically this commit (0.0.20) that breaks things - both sets of hyperparams train without a hitch on 0.0.19.
6b63067
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Lunariz You said MuLaN training is fine. May i have a look at your code? i want to know where i am mesing it?
Also, which data are you training MuLaN with?
6b63067
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still using a mock dataset. My code is adapted from this comment
6b63067
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I am trying with the test dataset (5.5k samples) that they used in the original paper.
6b63067
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the model architecture, the GPU gets out of memory. However reducing the amount of audio input from the dataset solves the problem.
In my case, it runs fine for maximum 3 seconds, 24kHz audio input, on 24GB RTX.