High memory consumption #3124

moi90 · 2018-04-11T07:15:27Z

I have a fairly large FlowSight cif file (3.1G, 1M objects) and I wanted to convert it to tif. This seems to be a problem for the following reasons:

loci.formats.in.FlowSightReader.initFile contains a loop over all IFDs and some metadata is added to List core. core is in fact an ArrayList that has to be resized over and over again (and internally, the existing values have to be copied over to the enlarged array). This is very inefficient and takes up unnecessarily large amount of memory (resulting in java.lang.OutOfMemoryError: Java heap space). I was able to circumvent this by doing ((ArrayList<?>)core).ensureCapacity(ifdOffsets.length); before the loop. Should I create a pull request?
The next problem was a java.lang.OutOfMemoryError: GC overhead limit exceeded while preparing the writing of the output file. Apparently, a very large data structure is put togester prior to writing the file. I was able to mitigate this issue by setting BF_MAX_MEM=32g, but I don't think that this is a general solution, because many people wont have so much RAM. As a user, I would expect that the application would need a rather small amount of memory, because all objects could be read, processed and written one after the other.

The text was updated successfully, but these errors were encountered:

dgault · 2018-04-11T13:32:58Z

Hi @moi90, thank you for getting in touch and carrying out some investigation. For the first issue please do go ahead and open a PR and we will begin carrying out some testing on it. If you require any help or assistance when doing so just let me know.

For the second point, that does seem like an excessive amount of RAM to be used. Previously when I have carried out profiling the largest memory usage tended to be in cases with very large numbers of metadata entries resulting in a large XML blob of text prior to it being written to the file. This sounds like it might be the large structure you are noticing here. Are you writing to OME-TIFF or to plain TIFF? Is it a single file you are writing to or is data split over multiple files?

moi90 · 2018-04-12T14:04:57Z

Thanks for your quick response!

All subclasses of FormatReader could potentially benefit from ensureCapacity, so the type of core could be changed from the generic List to ArrayList. What do you think?
Yes, the stack trace contained something related to xml metadata, but I'm currently unable to reproduce it. It should be writing a single plain tiff (command issued was ./bfconvert -compression LZW file.cif file.tif).

moi90 · 2018-04-16T08:17:01Z

The process is also insanely slow. Currently, it converts at 1 image per second. This is unbearable for 1M images. My post-processing in Python (rescaling, writing individual png files) runs at 1,000 images per second. It's now done and the last message is:

479848.25s elapsed (0.0402195+239.75208ms per plane, 130885ms overhead)

dgault · 2018-04-17T14:47:04Z

With regards the handling of core metadata in FormatReader, it is worth noting that there is a new PR open for sub resolution support which will be changing this behaviour. Looking at the performance of the metadata is something which we should be considering as a part of that ongoing work. The recommendations made here will certainly be taken into consideration when reviewing those upcoming changes.

#3119

As for the slow conversion times, I have been trying to profile some .cif conversions today using a few different methods but my results so far have not shown much consistency. Are you using the bfconvert through the command line tools or are have you written a separate program?

moi90 · 2018-04-23T10:19:00Z

Thanks for considering my case!

I'm using bfconvert from the command line tools.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High memory consumption #3124

High memory consumption #3124

moi90 commented Apr 11, 2018

dgault commented Apr 11, 2018

moi90 commented Apr 12, 2018

moi90 commented Apr 16, 2018 •

edited

Loading

dgault commented Apr 17, 2018

moi90 commented Apr 23, 2018

High memory consumption #3124

High memory consumption #3124

Comments

moi90 commented Apr 11, 2018

dgault commented Apr 11, 2018

moi90 commented Apr 12, 2018

moi90 commented Apr 16, 2018 • edited Loading

dgault commented Apr 17, 2018

moi90 commented Apr 23, 2018

moi90 commented Apr 16, 2018 •

edited

Loading