Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about speaker encoder input #145

Open
cameronfr opened this issue Feb 26, 2024 · 3 comments
Open

Question about speaker encoder input #145

cameronfr opened this issue Feb 26, 2024 · 3 comments

Comments

@cameronfr
Copy link

The paper mentions thatThe tone color extractor is a simple 2D convolutional neural network that operates on the mel-spectrogram of the input voice and outputs a single feature vector that encodes the tone color information., but in api.py I see that it looks like it's operating on the non-mel spectrogram.

        for fname in ref_wav_list:
            audio_ref, sr = librosa.load(fname, sr=hps.data.sampling_rate)
            y = torch.FloatTensor(audio_ref)
            y = y.to(device)
            y = y.unsqueeze(0)
            y = spectrogram_torch(y, hps.data.filter_length,
                                        hps.data.sampling_rate, hps.data.hop_length, hps.data.win_length,
                                        center=False).to(device)
            with torch.no_grad():
                g = self.model.ref_enc(y.transpose(1, 2)).unsqueeze(-1)
                gs.append(g.detach())
        gs = torch.stack(gs).mean(0)

I'm wondering if this is true, and if so, if there was a reason for using the non-mel spectrogram (was quality better)?

@Zengyi-Qin
Copy link
Contributor

Thanks for pointing out. This is true. There is actually not a performance difference between this two

@cameronfr
Copy link
Author

Ah thank you and to clarify, the mel input in question was ~128 channels?

@AbdulbariSoylemez
Copy link

How can i optimize the audio cloning process how can i make a change to the def extract_se function?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants