-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about speaker encoder input #145
Comments
Thanks for pointing out. This is true. There is actually not a performance difference between this two |
Ah thank you and to clarify, the mel input in question was ~128 channels? |
How can i optimize the audio cloning process how can i make a change to the def extract_se function? |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The paper mentions that
The tone color extractor is a simple 2D convolutional neural network that operates on the mel-spectrogram of the input voice and outputs a single feature vector that encodes the tone color information.
, but inapi.py
I see that it looks like it's operating on the non-mel spectrogram.I'm wondering if this is true, and if so, if there was a reason for using the non-mel spectrogram (was quality better)?
The text was updated successfully, but these errors were encountered: