Question about speaker encoder input #145

cameronfr · 2024-02-26T05:00:55Z

The paper mentions thatThe tone color extractor is a simple 2D convolutional neural network that operates on the mel-spectrogram of the input voice and outputs a single feature vector that encodes the tone color information., but in api.py I see that it looks like it's operating on the non-mel spectrogram.

        for fname in ref_wav_list:
            audio_ref, sr = librosa.load(fname, sr=hps.data.sampling_rate)
            y = torch.FloatTensor(audio_ref)
            y = y.to(device)
            y = y.unsqueeze(0)
            y = spectrogram_torch(y, hps.data.filter_length,
                                        hps.data.sampling_rate, hps.data.hop_length, hps.data.win_length,
                                        center=False).to(device)
            with torch.no_grad():
                g = self.model.ref_enc(y.transpose(1, 2)).unsqueeze(-1)
                gs.append(g.detach())
        gs = torch.stack(gs).mean(0)

I'm wondering if this is true, and if so, if there was a reason for using the non-mel spectrogram (was quality better)?

The text was updated successfully, but these errors were encountered:

Zengyi-Qin · 2024-02-26T16:56:51Z

Thanks for pointing out. This is true. There is actually not a performance difference between this two

cameronfr · 2024-02-26T17:02:01Z

Ah thank you and to clarify, the mel input in question was ~128 channels?

AbdulbariSoylemez · 2024-05-14T14:04:58Z

How can i optimize the audio cloning process how can i make a change to the def extract_se function?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about speaker encoder input #145

Question about speaker encoder input #145

cameronfr commented Feb 26, 2024

Zengyi-Qin commented Feb 26, 2024

cameronfr commented Feb 26, 2024

AbdulbariSoylemez commented May 14, 2024

Question about speaker encoder input #145

Question about speaker encoder input #145

Comments

cameronfr commented Feb 26, 2024

Zengyi-Qin commented Feb 26, 2024

cameronfr commented Feb 26, 2024

AbdulbariSoylemez commented May 14, 2024