-
-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running TextGCN processing crashes due to high RAM usage #7
Comments
I have updated the script such that building document-word edges is done on the fly, but this only slightly reduced RAM usage. Graph preprocessing is quite intensive as its quadratic with number of nodes (in this case 62206 + 400 word + doc nodes) so I'm afraid theres probably no way to reduce further without decreasing the number of nodes. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi. I'm exploring the usage of the TextGCN implementation in the toolkit. I saw the sample using Bible Text but decided to explore using the toolkit instead as it is easier to use the package on Google Colab. I managed to clone the repo and import the library. Using the data from the Bible sample, the code runs until the
train_and_fit(config)
part. My setup is as follows:I set the
train_data
andinfer_data
to the same csv file first just to see if I could get the model to run but it seems that I couldn't get through preprocessing.train_and_fit(config)
runs upto building document-word edges but then spikes RAM usage to 12Gb and crashes colab (running with GPU). Output before crashing is as follows:I initially didn't set a value
max_vocab_len
but it couldn't get past 3% on building document-word edges. I limited it to 400 and was able to reach 100% but it crashes after that. I'm afraid setting it lower would essentially remove most of the data.My actual data has around double the number of documents in the Bible sample so I was wondering if there's a way to minimize RAM consumption to get it to work without needing more than 12Gb of RAM.
--
Edit: I tried using the suggested dataset (IMDB Sentiment Classification) with max vocab of 200 but it crashes during building of adjacency matrix as well.
The text was updated successfully, but these errors were encountered: