-
Notifications
You must be signed in to change notification settings - Fork 241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High memory consumption #3124
Comments
Hi @moi90, thank you for getting in touch and carrying out some investigation. For the first issue please do go ahead and open a PR and we will begin carrying out some testing on it. If you require any help or assistance when doing so just let me know. For the second point, that does seem like an excessive amount of RAM to be used. Previously when I have carried out profiling the largest memory usage tended to be in cases with very large numbers of metadata entries resulting in a large XML blob of text prior to it being written to the file. This sounds like it might be the large structure you are noticing here. Are you writing to OME-TIFF or to plain TIFF? Is it a single file you are writing to or is data split over multiple files? |
Thanks for your quick response!
|
The process is also insanely slow. Currently, it converts at 1 image per second. This is unbearable for 1M images. My post-processing in Python (rescaling, writing individual png files) runs at 1,000 images per second. It's now done and the last message is:
|
With regards the handling of core metadata in FormatReader, it is worth noting that there is a new PR open for sub resolution support which will be changing this behaviour. Looking at the performance of the metadata is something which we should be considering as a part of that ongoing work. The recommendations made here will certainly be taken into consideration when reviewing those upcoming changes. As for the slow conversion times, I have been trying to profile some |
Thanks for considering my case! I'm using |
I have a fairly large FlowSight cif file (3.1G, 1M objects) and I wanted to convert it to tif. This seems to be a problem for the following reasons:
loci.formats.in.FlowSightReader.initFile
contains a loop over all IFDs and some metadata isadd
ed toList core
.core
is in fact anArrayList
that has to be resized over and over again (and internally, the existing values have to be copied over to the enlarged array). This is very inefficient and takes up unnecessarily large amount of memory (resulting injava.lang.OutOfMemoryError: Java heap space
). I was able to circumvent this by doing((ArrayList<?>)core).ensureCapacity(ifdOffsets.length);
before the loop. Should I create a pull request?java.lang.OutOfMemoryError: GC overhead limit exceeded
while preparing the writing of the output file. Apparently, a very large data structure is put togester prior to writing the file. I was able to mitigate this issue by settingBF_MAX_MEM=32g
, but I don't think that this is a general solution, because many people wont have so much RAM. As a user, I would expect that the application would need a rather small amount of memory, because all objects could be read, processed and written one after the other.The text was updated successfully, but these errors were encountered: