Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU version support? #14

Open
leemengxing opened this issue Feb 27, 2020 · 10 comments
Open

GPU version support? #14

leemengxing opened this issue Feb 27, 2020 · 10 comments

Comments

@leemengxing
Copy link

Thank you for your work, I would like to ask if you can add GPU support options in torch.hub?Another question is whether the obtained embedding_size must be a fixed value of 128, is there a way to convert to 2048 dimensions?

@stevenguh
Copy link
Contributor

I think you should be able to do model.to('cuda') to convert the model to cuda.

The model itself is pretty simple, so you should be able to load the pre-trained weights without the last layer. But that requires some manual work with forking this repo.

@leemengxing
Copy link
Author

vggish strictly extracts features every 0.96 seconds, but my image features extract features every 1s. Do you have a good way to align features and look forward to your suggestions?

@stevenguh
Copy link
Contributor

You should be able to just crop the 1 second audio to .96 seconds

@leemengxing
Copy link
Author

I'm sorry. I may have described the problem. For example, my video is half an hour. I select one frame of image every second to extract the image features after rensnet18, and the audio features are vggish. But I found that the dimension of the image is [30 * 60,512], but the audio feature test [30 * 60 / 0.96,128]. I want to align features in the time dimension. What should I do?

@leemengxing
Copy link
Author

I found that 4 seconds video does not have this problem. because [4,512] == [4/0.96,128].Any suggestion is welcome,thx very much.

@harritaylor
Copy link
Owner

@leemengxing this repo is just for the port of vggish to pytorch. I suggest you ask this question on https://groups.google.com/forum/#!forum/audioset-users - you're more likely to have a useful response from those guys 😄 I'm not really sure how to help with that particular problem other than to crop the audio per second to 0.96 like @stevenguh suggested.

As the GPU support has been resolved upthread, I'm closing this issue now. Thanks.

@botkevin
Copy link

botkevin commented Aug 8, 2020

I know this is closed, but when I try to send the model to cuda using model.cuda(), pytorch will throw me RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same. I solved this by adding the following code to VGGish.forward in vggish.py:

def forward(self, x, fs=None):
    if self.preprocess:
        x = self._preprocess(x, fs)
    # start added code
    if next(self.parameters()).is_cuda:
        x = x.cuda()
    # end added code
    x = VGG.forward(self, x)
    if self.postprocess:
        x = self._postprocess(x)
    return x

It's not the most elegant solution, but I am just checking if the model weights are cuda and if so changing the data to such. From my tests so far it seems to work, but please let me know if there is something wrong with this.

@harritaylor
Copy link
Owner

@botkevin nothing wrong with that if it works! However I have realised that the offending line is:

super().load_state_dict(state_dict)
. There is a way to serialise weights to cuda automatically afaik. I will try to fix this issue later today. Thanks for raising it!

@dfan
Copy link
Contributor

dfan commented Sep 14, 2020

#19

Sending the model to GPU works fine but PyTorch will complain RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same unless the audio tensor is also sent to GPU.

That said, the speedup is not dramatic because most of the time is spent in pre-processing. For a 2 second audio clip that I tested on CPU, 70 milliseconds were spent on pre-processing the audio file into an array of spectrogram patches, and 20 milliseconds were spent on inference itself.

@nhattruongpham
Copy link

Hi,

You guys can check my configuration based on v0.1 at https://github.com/nhattruongpham/torchvggish-gpu

That worked for me because I had converted the PCA params tensor to cuda.

Good luck!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants