-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discobox evaluation failed with custom dataset #5
Comments
If it helps, here is the config file I'm using:
My environment specs:
|
Hi @ameyparanjape, Thanks for your interest in our work. I noticed that the log "died with <Signals.SIGKILL: 9>.". This is saying the Linux scheduler killed the job you were running. In terms of my experience, it is usually caused by Insufficient CPU memory. You might need to check the memory usage during eval. Best, Shiyi |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Thanks to the authors for making DiscoBox code public.
I am trying to finetune the COCO checkpoint on my custom COCO style dataset.
My training seems to be running fine, but as soon as the scripts gets into evaluation (during train), it gets stuck and after a minutes wait, process is terminated abruptly.
Here is the train command I use:
Error I get -
To further investigate this, I also tried running training on 1 GPU,
This failed too with the same error.
Another thing I thought is wroth trying is to run a separate eval -
Again, resulted into the same error.
Has anyone come across this error? Please help, thanks!
The text was updated successfully, but these errors were encountered: