Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error during execution of LinkProbability and PU #9

Open
andres11f opened this issue Jul 18, 2018 · 1 comment
Open

error during execution of LinkProbability and PU #9

andres11f opened this issue Jul 18, 2018 · 1 comment

Comments

@andres11f
Copy link

I have been having problems executing LinkProbability and PU.

When I execute LinkProbability i get this error:

image

It seemed to me to have something to do with some of the strange characters in the info-measure.txt file so I deleted everything in this file except the first 400 or so lines and with that it works (if I leave chinese characters in my reduced info-measure.txt file it fails; so I guess it's safe to say that this error is caused by those characters)

How do I avoid this error? am I doing anything wrong, perhaps some config or something in the character encoding? I find it strange that what seems to fail is caused by the provided file info-measure.txt and I can only make it work after altering that file.

Next, when I execute PU I keep getting this error:

success

I thought it had something to do with permissions but even when I execute cmd as an administrator I keep getting the error. I have searched but can't find anything useful besides someone saying that this is a Windows problem and Spark doesn't give a lot of support to the windows version. I would like to not have to install another OS just to execute the program.

Lastly, I would like to ask if there is some kind of minimum requirements to executing some of the algorithms since I noticed that they use my memory up to 95% of its capacity. I have a laptop with 8gb of RAM but my intention is to run the program in a lower-end computer and I am worried that it will simply not have enough resources to do it.

@astrakhantsev
Copy link
Collaborator

Sorry for the late response.

  1. Fixed. There was an issue with encoding during info-measure.txt reading.
  2. Can't fix this; tried multiple solutions, but seems like there is a bug in Sparl for Windows indeed.
  3. It highly depends on size of the datasets you are going to use and on particular methods. I didn't measure memory consumption, but with Xmx of 12gb it could process datasets up to 64M words; I tried to use iterators wherever possible, but there is some lower limit - e.g. to store word2vec model used for KeyConceptRelatedness in the memory, which would cost about 1gb.
    In case of low memory and big datasets, I'd suggest to firstly try faster methods (see table 6 from the paper), they tend to occupy less memory and they may have not worse quality in some cases, especially on big datasets, where good statistics on word occurrences is available. E.g. try to play with parameters of ComboBasic - increase alpha if you want more 'representative' or generic terms and increase betta if you want more specific terms.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants