-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pre-train model for Javascript #9
Comments
Unfortunately I do not have pretrained embeddings for JavaScript. You can either use your data directly, use the fetcher module to collect some data to generate the embedding, or use an existing dataset such as the one available here: Please let me know if you run into any trouble. |
@tuvistavie thank you. i am using this dataset http:https://www.srl.inf.ethz.ch/js150.php. probably my approach is not intuitive but what I want is to train, validate and test a vanilla LSTM model for auto-completing javascript code (as a pet project). so i first want to convert the input source code into an embedding which becomes the input to the LSTM model. It is the LSTM model that I need to train, validate and test and is wondering how a single embedding would help? in summary, what I want is to preprocess the train, test and validation set into a format (some vector) that I can pass as input to the lstm model i.e source code -> asts -> skipgram-data -> embeddings -> LSTM model any thoughts if that make sense? |
What you are trying to do makes sense, but the skipgram usage is I think a little off.
Note that to be able to use the embeddings with any framework, bigcode-embeddings has a command to export to
You can choose to fine-tune the embeddings when training your LSTM, but you should normally only train the embeddings once. Does this make sense? |
Thank you @tuvistavie. yes it makes sense. i was wondering if I should use all my data (train + test + val) when training the embedding? in step 2: should I use the JSON serialization of the AST together with the learned embeddings as input to the LSTM? i.e code -> (asts, learned embeddings) -> LSTM. am just a bit confuse moving from asts -> embeddings learned in 1 (since we already have the embeddings from step 1) |
Sorry for the delay.
I suggest you use only your training data when learning the embeddings,
Yes, you should use the vocabulary and embeddings from step 1. |
@tuvistavie is there an existing pre-trained model for Javascript that one can use out-of-the-box? i have three data sets for training, testing and validation and I want to generated emebeddings for each dataset.
The text was updated successfully, but these errors were encountered: