Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automated Substrate Mapping Workflow #39

Closed
CameronBodine opened this issue Dec 21, 2022 · 5 comments
Closed

Automated Substrate Mapping Workflow #39

CameronBodine opened this issue Dec 21, 2022 · 5 comments
Labels
0_Code for Zenhub Workspace Filtering bug Something isn't working enhancement New feature or request

Comments

@CameronBodine
Copy link
Owner

Use preliminary substrate segmentation models to build out substrate mapping workflow. Use this issue to track commits pertaining to substrate mapping.

@CameronBodine CameronBodine added 0_Code for Zenhub Workspace Filtering enhancement New feature or request labels Dec 21, 2022
CameronBodine added a commit that referenced this issue Jan 9, 2023
@CameronBodine
Copy link
Owner Author

After batch processing 40 recordings (failed on LEA_020_000_20210518_USM1_Rec00005), the process was terminated during the Mapping Substrate Classification step. 72 out of 130 port/starboard pairs were successfully exported, then the following error was thrown:

Mapping substrate classification...

	Mapping substrate classification. Processing 130 port and starboard pairs...
[Parallel(n_jobs=24)]: Using backend LokyBackend with 24 concurrent workers.
[Parallel(n_jobs=24)]: Done   2 tasks      | elapsed:  2.3min
[Parallel(n_jobs=24)]: Done  13 tasks      | elapsed:  3.2min
[Parallel(n_jobs=24)]: Done  24 tasks      | elapsed:  3.5min
[Parallel(n_jobs=24)]: Done  37 tasks      | elapsed:  5.6min
[Parallel(n_jobs=24)]: Done  50 tasks      | elapsed:  7.6min
[Parallel(n_jobs=24)]: Done  65 tasks      | elapsed:  8.6min
exception calling callback for <Future at 0x7fc754164a50 state=finished raised TerminatedWorkerError>
Traceback (most recent call last):
  File "/home/cbodine/miniconda3/envs/ping/lib/python3.11/site-packages/joblib/externals/loky/_base.py", line 26, in _invoke_callbacks
    callback(self)
  File "/home/cbodine/miniconda3/envs/ping/lib/python3.11/site-packages/joblib/parallel.py", line 385, in __call__
    self.parallel.dispatch_next()
  File "/home/cbodine/miniconda3/envs/ping/lib/python3.11/site-packages/joblib/parallel.py", line 834, in dispatch_next
    if not self.dispatch_one_batch(self._original_iterator):
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cbodine/miniconda3/envs/ping/lib/python3.11/site-packages/joblib/parallel.py", line 901, in dispatch_one_batch
    self._dispatch(tasks)
  File "/home/cbodine/miniconda3/envs/ping/lib/python3.11/site-packages/joblib/parallel.py", line 819, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cbodine/miniconda3/envs/ping/lib/python3.11/site-packages/joblib/_parallel_backends.py", line 556, in apply_async
    future = self._workers.submit(SafeFunction(func))
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cbodine/miniconda3/envs/ping/lib/python3.11/site-packages/joblib/externals/loky/reusable_executor.py", line 176, in submit
    return super().submit(fn, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cbodine/miniconda3/envs/ping/lib/python3.11/site-packages/joblib/externals/loky/process_executor.py", line 1129, in submit
    raise self._flags.broken
joblib.externals.loky.process_executor.TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmentation fault while calling the function or by an excessive memory usage causing the Operating System to kill the worker.

The exit codes of the workers are {SIGKILL(-9)}
[Parallel(n_jobs=24)]: Done  87 out of 121 | elapsed: 10.3min remaining:  4.0min
[Parallel(n_jobs=24)]: Done 100 out of 121 | elapsed: 10.3min remaining:  2.2min
[Parallel(n_jobs=24)]: Done 113 out of 121 | elapsed: 10.3min remaining:   43.9s
Traceback (most recent call last):
  File "/home/cbodine/PythonRepos/PINGMapper/main_batchDirectory_csb.py", line 269, in <module>
    map_master_func(**params)
  File "/home/cbodine/PythonRepos/PINGMapper/src/main_mapSubstrate.py", line 423, in map_master_func
    Parallel(n_jobs=np.min([len(toMap), threadCnt]), verbose=10)(delayed(psObj._mapSubstrate)(map_class_method, c, f) for c, f in toMap.items())
  File "/home/cbodine/miniconda3/envs/ping/lib/python3.11/site-packages/joblib/parallel.py", line 1098, in __call__
    self.retrieve()
  File "/home/cbodine/miniconda3/envs/ping/lib/python3.11/site-packages/joblib/parallel.py", line 975, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cbodine/miniconda3/envs/ping/lib/python3.11/site-packages/joblib/_parallel_backends.py", line 567, in wrap_future_result
    return future.result(timeout=timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cbodine/miniconda3/envs/ping/lib/python3.11/concurrent/futures/_base.py", line 456, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/home/cbodine/miniconda3/envs/ping/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/home/cbodine/miniconda3/envs/ping/lib/python3.11/site-packages/joblib/externals/loky/_base.py", line 26, in _invoke_callbacks
    callback(self)
  File "/home/cbodine/miniconda3/envs/ping/lib/python3.11/site-packages/joblib/parallel.py", line 385, in __call__
    self.parallel.dispatch_next()
  File "/home/cbodine/miniconda3/envs/ping/lib/python3.11/site-packages/joblib/parallel.py", line 834, in dispatch_next
    if not self.dispatch_one_batch(self._original_iterator):
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cbodine/miniconda3/envs/ping/lib/python3.11/site-packages/joblib/parallel.py", line 901, in dispatch_one_batch
    self._dispatch(tasks)
  File "/home/cbodine/miniconda3/envs/ping/lib/python3.11/site-packages/joblib/parallel.py", line 819, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cbodine/miniconda3/envs/ping/lib/python3.11/site-packages/joblib/_parallel_backends.py", line 556, in apply_async
    future = self._workers.submit(SafeFunction(func))
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cbodine/miniconda3/envs/ping/lib/python3.11/site-packages/joblib/externals/loky/reusable_executor.py", line 176, in submit
    return super().submit(fn, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cbodine/miniconda3/envs/ping/lib/python3.11/site-packages/joblib/externals/loky/process_executor.py", line 1129, in submit
    raise self._flags.broken
joblib.externals.loky.process_executor.TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmentation fault while calling the function or by an excessive memory usage causing the Operating System to kill the worker.

The exit codes of the workers are {SIGKILL(-9)}

This appears to be a memory leak (#38). Below are the memory usage stats for each step while processing LEA_20210518_USM1/020_000_Rec00005:

Start:

________________________
CPU % | RAM % | RAM [GB]
________________________
6.7   | 9.3   | 9.7  
________________________

Summary of ping attributes:

________________________
CPU % | RAM % | RAM [GB]
________________________
4.5   | 11.9  | 12.7 
________________________

Auto depth estimation:

________________________
CPU % | RAM % | RAM [GB]
________________________
8.6   | 23.6  | 26.5 
________________________

Auto shadow removal:

________________________
CPU % | RAM % | RAM [GB]
________________________
1.3   | 21.5  | 23.9 
________________________

Smoothing trackline:

________________________
CPU % | RAM % | RAM [GB]
________________________
7.0   | 10.0  | 10.4 
________________________

Auto substrate segmentation:

________________________
CPU % | RAM % | RAM [GB]
________________________
0.2   | 30.8  | 34.7 
________________________

Substrate plot export:

________________________
CPU % | RAM % | RAM [GB]
________________________
0.3   | 31.4  | 35.4 
________________________

Then the substrate mapping caused the program to fail. Need to investigate this in more detail.

@CameronBodine CameronBodine added the bug Something isn't working label Apr 3, 2023
CameronBodine added a commit that referenced this issue Apr 3, 2023
@CameronBodine
Copy link
Owner Author

Notes for implementing a new moving window mapping approach.

Old approach

The current mapping workflow takes each chunk, loads nchunk-1, nchunk, and nchunk+1 into memory. A moving window is passed across merged chunks, substrate is predicted, predictions are stacked, the average is taken, and the merged chunks are cropped to recover prediction at current chunk's extent.

Not a good approach because 3+ predictions are made for each chunk, which is redundant.

New approach

Before doing predictions, calculate the unique combination of chunks and window offsets based on stride. For example, with a chunk size of 500 and stride of 100, the unique combinations are:

Chunk Window Offset
1 0
1 100
1 200
1 300
1 400
2 0
... ...

Pass each chunk, window offset pair to the predict function, which will do the prediction and export to csv, using chunk and window offset as part of the name.

After all predictions are made, the map function will iterate each chunk, load all npzs that overlap with given chunk, stack the predictions and take average, then crop to chunk extent and rectify. These steps will reduce the total number of predicitons and likely speed the mapping process, which will also address #66.

@CameronBodine
Copy link
Owner Author

^^^^^^^^^^^ NOT SURE IF ABOVE IS THE BEST ^^^^^^^^^^^^^^^^^^^

Writing the extra prediction files to disk takes extra time, and extra space. I'm going to neglect this for now. I will make a commit just to keep as a record, in case I want to come back to this idea.

@CameronBodine
Copy link
Owner Author

Initial result on EGN substrate model vs Raw Substrate model:

EGN Model

PINGMapper-Test-Small-DS_pltSub_probability_ss_port_00000

Raw Model

PINGMapper-Test-Small-DS_pltSub_probability_ss_port_00000

My hope was that normalizing the sonar data would help with generalization to unseen datasets. This is the Hello World test sonar recording included with PING-Mapper. Normalization certainly helps in this case.

@CameronBodine
Copy link
Owner Author

Substrate workflow is fully implemented with https://github.com/CameronBodine/PINGMapper/releases/tag/v2.0.0-alpha. Closing as complete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0_Code for Zenhub Workspace Filtering bug Something isn't working enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant