Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JobManagerType documentation? #10

Closed
gillins opened this issue Sep 11, 2017 · 6 comments
Closed

JobManagerType documentation? #10

gillins opened this issue Sep 11, 2017 · 6 comments
Labels
bug Something isn't working major

Comments

@gillins
Copy link
Member

gillins commented Sep 11, 2017

Original report by Anonymous.


I'm getting the following error with muliprocessing JobManagerType using the addimage sample. Single threaded it works fine.



Process SpawnPoolWorker-1:
Traceback (most recent call last):
  File "C:\g\via\lib\python\lib\multiprocessing\process.py", line 252, in _bootstrap
    self.run()
  File "C:\g\via\lib\python\lib\multiprocessing\process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "C:\g\via\lib\python\lib\multiprocessing\pool.py", line 108, in worker
    task = get()
  File "C:\g\via\lib\python\lib\multiprocessing\queues.py", line 337, in get
    return ForkingPickler.loads(res)
AttributeError: Can't get attribute 'addThem' on <module '__mp_main__' from 'C:\\g\\via\\lib\\_learn\\rios\\imagadd.py'>
@gillins
Copy link
Member Author

gillins commented Sep 11, 2017

Original comment by alan piszcz (Bitbucket: apiszcz, GitHub: apiszcz).


#!python


if __name__ == "__main__":
    t0=time.time()
    # Set up input and output filenames.
    infiles = applier.FilenameAssociations()
    infiles.image1 = "file1.tif"
    infiles.image2 = "file2.tif"

    outfiles = applier.FilenameAssociations()
    outfiles.outimage = "outfile.tif"

    controls = applier.ApplierControls()
    controls.setReferenceImage(infiles.image1)

    controls.progress = cuiprogress.GDALProgressBar()

    controls.setOutputDriverName("GTiff")
    controls.setCreationOptions(["COMPRESS=DEFLATE"])

    controls.setNumThreads(multiprocessing.cpu_count())
    controls.setJobManagerType('multiprocessing')

    # Set up the function to be applied
    def addThem(info, inputs, outputs):
        """
        Function to be called by rios.
        Adds image1 and image2 from the inputs, and
        places the result in the outputs as outimage.
        """
        outputs.outimage = inputs.image1 + inputs.image2

    # Apply the function to the inputs, creating the outputs.
    applier.apply(addThem, infiles, outfiles, controls=controls)

    et=time.time()-t0
    print(et)

@gillins
Copy link
Member Author

gillins commented Sep 11, 2017

Original comment by Neil Flood (Bitbucket: neilflood, GitHub: neilflood).


There is an Edit button at the top right of the Issue page.

Could you also attach the code you are using which gives the problem?

@gillins
Copy link
Member Author

gillins commented Sep 11, 2017

Original comment by Sam Gillingham (Bitbucket: gillins, GitHub: gillins).


You may have to use the if __name__ == '__main__': statement in your script on Windows to get the multiprocessing module working properly. For more information see: https://docs.python.org/3.6/library/multiprocessing.html

@gillins
Copy link
Member Author

gillins commented Sep 12, 2017

Original comment by alan piszcz (Bitbucket: apiszcz, GitHub: apiszcz).


The version below with addThem outside of the scope of main runs, however it is ~17% slower compared to no multiprocessing.

No multiprocessing: 130 seconds
Multiprocessing (4 cores): 152 seconds

#!python

# Set up the function to be applied
def addThem(info, inputs, outputs):
    """
    Function to be called by rios.
    Adds image1 and image2 from the inputs, and
    places the result in the outputs as outimage.
    """
    outputs.outimage = inputs.image1 + inputs.image2

if __name__ == "__main__":
    t0=time.time()
    # Set up input and output filenames.
    infiles = applier.FilenameAssociations()
    infiles.image1 = "file1.tif"
    infiles.image2 = "file2.tif"

    outfiles = applier.FilenameAssociations()
    outfiles.outimage = "outfile.tif"

    controls = applier.ApplierControls()
    controls.setReferenceImage(infiles.image1)

    controls.progress = cuiprogress.GDALProgressBar()

    controls.setOutputDriverName("GTiff")
    controls.setCreationOptions(["COMPRESS=DEFLATE"])

    controls.setNumThreads(4)
    controls.setJobManagerType('multiprocessing')

    # Apply the function to the inputs, creating the outputs.
    applier.apply(addThem, infiles, outfiles, controls=controls)

    et=time.time()-t0
    print(et)

@gillins
Copy link
Member Author

gillins commented Sep 12, 2017

Original comment by Neil Flood (Bitbucket: neilflood, GitHub: neilflood).


Hi Alan,

sounds like Sam's suggestion was the key to getting it working on Windows. That's good.

I should remind you of the documentation at [http:https://rioshome.org/en/latest/rios_parallel_jobmanager.html#module-rios.parallel.jobmanager](Link URL), which emphasizes that the use of parallel sub-jobs is only of benefit for tasks which are substantially compute-bound. Most simple raster processing tasks (like adding two rasters together) are I/O bound, and do not benefit at all. As your test shows, the overhead of passing the data out to the sub-jobs makes the performance worse.

I am glad you got a simple example to work, though.

Neil

@gillins
Copy link
Member Author

gillins commented Sep 19, 2017

Original comment by Neil Flood (Bitbucket: neilflood, GitHub: neilflood).


Apparently resolved

@gillins gillins closed this as completed Sep 19, 2017
@gillins gillins added major bug Something isn't working labels Nov 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working major
Projects
None yet
Development

No branches or pull requests

1 participant