Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High memory consumption #3124

Open
moi90 opened this issue Apr 11, 2018 · 5 comments
Open

High memory consumption #3124

moi90 opened this issue Apr 11, 2018 · 5 comments

Comments

@moi90
Copy link

moi90 commented Apr 11, 2018

I have a fairly large FlowSight cif file (3.1G, 1M objects) and I wanted to convert it to tif. This seems to be a problem for the following reasons:

  1. loci.formats.in.FlowSightReader.initFile contains a loop over all IFDs and some metadata is added to List core. core is in fact an ArrayList that has to be resized over and over again (and internally, the existing values have to be copied over to the enlarged array). This is very inefficient and takes up unnecessarily large amount of memory (resulting in java.lang.OutOfMemoryError: Java heap space). I was able to circumvent this by doing ((ArrayList<?>)core).ensureCapacity(ifdOffsets.length); before the loop. Should I create a pull request?
  2. The next problem was a java.lang.OutOfMemoryError: GC overhead limit exceeded while preparing the writing of the output file. Apparently, a very large data structure is put togester prior to writing the file. I was able to mitigate this issue by setting BF_MAX_MEM=32g, but I don't think that this is a general solution, because many people wont have so much RAM. As a user, I would expect that the application would need a rather small amount of memory, because all objects could be read, processed and written one after the other.
@dgault
Copy link
Member

dgault commented Apr 11, 2018

Hi @moi90, thank you for getting in touch and carrying out some investigation. For the first issue please do go ahead and open a PR and we will begin carrying out some testing on it. If you require any help or assistance when doing so just let me know.

For the second point, that does seem like an excessive amount of RAM to be used. Previously when I have carried out profiling the largest memory usage tended to be in cases with very large numbers of metadata entries resulting in a large XML blob of text prior to it being written to the file. This sounds like it might be the large structure you are noticing here. Are you writing to OME-TIFF or to plain TIFF? Is it a single file you are writing to or is data split over multiple files?

@moi90
Copy link
Author

moi90 commented Apr 12, 2018

Thanks for your quick response!

  1. All subclasses of FormatReader could potentially benefit from ensureCapacity, so the type of core could be changed from the generic List to ArrayList. What do you think?
  2. Yes, the stack trace contained something related to xml metadata, but I'm currently unable to reproduce it. It should be writing a single plain tiff (command issued was ./bfconvert -compression LZW file.cif file.tif).

@moi90
Copy link
Author

moi90 commented Apr 16, 2018

The process is also insanely slow. Currently, it converts at 1 image per second. This is unbearable for 1M images. My post-processing in Python (rescaling, writing individual png files) runs at 1,000 images per second. It's now done and the last message is:

479848.25s elapsed (0.0402195+239.75208ms per plane, 130885ms overhead)

@dgault
Copy link
Member

dgault commented Apr 17, 2018

With regards the handling of core metadata in FormatReader, it is worth noting that there is a new PR open for sub resolution support which will be changing this behaviour. Looking at the performance of the metadata is something which we should be considering as a part of that ongoing work. The recommendations made here will certainly be taken into consideration when reviewing those upcoming changes.

#3119

As for the slow conversion times, I have been trying to profile some .cif conversions today using a few different methods but my results so far have not shown much consistency. Are you using the bfconvert through the command line tools or are have you written a separate program?

@moi90
Copy link
Author

moi90 commented Apr 23, 2018

Thanks for considering my case!

I'm using bfconvert from the command line tools.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants