Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can you suggest how many dimensions to use for word vectors? #22

Open
Jessegoodspeed opened this issue Mar 8, 2018 · 4 comments
Open

Comments

@Jessegoodspeed
Copy link

Hi Adit - Great repo! I am trying to use Seq2Seq.py to generate the word vectors, and it asks for the number of dimensions. I was reading somewhere that it is ideal to use somewhere between 300-500 dimensions. Can you suggest how many should be used? Or how many did you use for your runs?

@Jessegoodspeed Jessegoodspeed changed the title Can you suggest how many many dimensions to use for word vectors? Can you suggest how many dimensions to use for word vectors? Mar 9, 2018
@adeshpande3
Copy link
Owner

Yeah, I guess the answer to this depends on the complexity of your NLP task as well as the computational power you have, and also the amount of training data you have. Because there are all these factors, it makes it difficult to recommend one particular size, and I don't think there are any papers that say one size is better than another. My intuition is the larger your vectors are, the more info you can pack into them and thus the better they will be. However, this is also dependent on whether or not your training corpus is even large enough to be able to learn accurate vector representations. Long story short, I think this is more of a trial/error hyperparameter you need to play around with. Although, I would love to hear if there is a particular size that worked the best for you.

@OrangeAaron
Copy link

I'm not seeing where the results of that question are even used in the python code provided, can you direct me to where in the code that even gets used?

@Jessegoodspeed
Copy link
Author

line 149 of Seq2Seq.py is prompted if you do not use the word2vec.py script.

@OrangeAaron
Copy link

line 149 of Seq2Seq.py is prompted if you do not use the word2vec.py script.

Yes I see it prompted, but I don't see the results ever passed into anything after it's read in from the terminal? Where are the word dimensions ever used?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants