-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ValueError: perplexity must be less than n_samples #6
Comments
You are right. Thank you! I will fix it in the next release (will try to make it soon). |
@caicai555 I have pushed a fixing commit. Could you please test it? |
Just upgraded tmplot to 0.0.9 and tested with the same data, which worked when I simply set perplexity of tsne to 5 in the 0.0.8 version. Another error occurs...
|
Could you please post the code and data sample which give such an error? |
Just like this one I think. Here come the codes run on jupyter notebook with python=3.8.8
And here's part of the [texts] data, which should be innocent in this issue.
Thx for ur great effort and quick replies on this issue. If you need the exact model file, plz let me know :) (BTW, I've tried to analyse my texts with your BitermPlus packet, concise and elegant, quite easy to get hang of it. However, I find the results turned out to be different with the same settings on R package BTM (https://github.com/bnosac/BTM) and BTMpy(https://github.com/jasperyang/BTMpy). The latter two gave the same topic distribution and the top words seem to make sense, but the BitermPlus' output is a bit confusing. I am a beginner in BTM and not sure what's behind the problem, maybe you could check it if you got the time? PS: maybe I should leave an issue there but I don't want to annoy you xd |
Thank you for your comments! I will try to reproduce this bug. By the way, what number of iterations have you used with bitermplus? The authors of algorithm refer to 2000 in their paper. I have found 600 to be more or less sufficient. |
Due to time restrictions, I have only tried 200 iters maximum XD. I'll test with more iterations later when I return to my lab. What confused me the most is that the results differ even if I get all the settings exactly the same. Thanks again for your warm help :) |
When the topic number is more than 30 (which is the TSNE's default perplexity setting), the ValueError occurs.
ValueError: perplexity must be less than n_samples
Maybe we shall simply set perplexity to 5, or change it according to the number of topics (e.g. add an n_topic variable to the _report() and _distance() methods)
The text was updated successfully, but these errors were encountered: