[utils.bottleneck] Bottleneck crashes with multi-threaded data loader #6313

fmassa · 2018-04-05T14:24:31Z

torch.utils.bottleneck doesn't work properly when the code contains a data loader that uses more than 0 threads.

Minimum reproducible example (mwe.py):

import argparse
import torch
import torch.utils.data

if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='mwe')
    parser.add_argument('--num-workers', default=0, type=int)
    args = parser.parse_args()

    data = torch.rand(10, 1000)
    target = torch.rand(10)
    dataset = torch.utils.data.TensorDataset(data, target)
    data_loader = torch.utils.data.DataLoader(dataset,
        batch_size=2, num_workers=args.num_workers)
    for i, batch in enumerate(data_loader):
        pass

Running the script via:

python -m torch.utils.bottleneck -- mwe.py --num-workers 0

works fine, while

python -m torch.utils.bottleneck -- mwe2.py --num-workers 1

crashes with the following stack trace:

Traceback (most recent call last):
  File "/private/home/fmassa/.conda/envs/detectron_v2/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/private/home/fmassa/.conda/envs/detectron_v2/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/private/home/fmassa/.conda/envs/detectron_v2/lib/python3.6/site-packages/torch/utils/bottleneck/__main__.py", line 280, in <module>
    main()
  File "/private/home/fmassa/.conda/envs/detectron_v2/lib/python3.6/site-packages/torch/utils/bottleneck/__main__.py", line 261, in main
    autograd_prof_cpu, autograd_prof_cuda = run_autograd_prof(code, globs)
  File "/private/home/fmassa/.conda/envs/detectron_v2/lib/python3.6/site-packages/torch/utils/bottleneck/__main__.py", line 155, in run_autograd_prof
    result.append(run_prof(use_cuda=True))
  File "/private/home/fmassa/.conda/envs/detectron_v2/lib/python3.6/site-packages/torch/utils/bottleneck/__main__.py", line 149, in run_prof
    exec(code, globs, None)
  File "mwe2.py", line 15, in <module>
    for i, batch in enumerate(data_loader):
  File "/private/home/fmassa/.conda/envs/detectron_v2/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 285, in __next__
    return self._process_next_batch(batch)
  File "/private/home/fmassa/.conda/envs/detectron_v2/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 306, in _process_next_batch
    raise batch.exc_type(batch.exc_msg)
RuntimeError: Traceback (most recent call last):
  File "/private/home/fmassa/.conda/envs/detectron_v2/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 57, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/private/home/fmassa/.conda/envs/detectron_v2/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 57, in <listcomp>
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/private/home/fmassa/.conda/envs/detectron_v2/lib/python3.6/site-packages/torch/utils/data/dataset.py", line 40, in __getitem__
    return tuple(tensor[index] for tensor in self.tensors)
  File "/private/home/fmassa/.conda/envs/detectron_v2/lib/python3.6/site-packages/torch/utils/data/dataset.py", line 40, in <genexpr>
    return tuple(tensor[index] for tensor in self.tensors)
RuntimeError: /private/home/fmassa/github/pytorch/torch/csrc/autograd/profiler.h:52: initialization error

assigning this to @zou3519 , even thought I'm not sure if it's a problem in the profiler or in the bottleneck tool.

pytorch version '0.4.0a0+b21e135'

cc @ezyang @gchanan @zou3519 @ssnl

The text was updated successfully, but these errors were encountered:

zou3519 · 2018-04-05T14:33:21Z

Yikes, thanks for the catch. I'll look into it

TiRune · 2018-08-23T12:40:59Z

Ran into this problem as well on unix; Would be great if we could still use the bottleneck functionality with multiple dataloaders :)

jhagege · 2018-11-04T17:09:39Z

Hi, has anyone found any workaround with this issue ?
I am trying to identify bottleneck in a PyTorch code and have a hard time doing it without proper profiling.
Can anyone recommend good resources for identifying bottleneck (even with other techniques) ?
Thanks !

stmharry · 2018-12-28T00:08:08Z

This thread deserves more visibility.

Tried many ways to avoid /pytorch/torch/csrc/autograd/profiler.h:52: initialization error, and ended up here, just to realize the problem is the number of workers.

Have not been able to locate a good tool in identifying the bottleneck.

ezyang · 2019-01-03T08:27:51Z

@stmharry A very simple bandaid which I'd be happy to approve, is to update the "initialization error" message with a suggested workaround of reducing the number of worker threads. Would you like to submit a PR?

dcela · 2019-03-23T17:41:30Z

This is still an issue for me as well

yaceben · 2019-04-02T17:44:59Z

Still an issue for me as well

XiaobingSuper · 2019-04-03T14:42:59Z

I also meet a similar issue when using autograd profiler on GPU device. You can reproduce the result by running the following code:

import argparse
import torch
import torch.utils.data

if __name__ == '__main__':
     parser = argparse.ArgumentParser(description='mwe')
     parser.add_argument('--num-workers', default=0, type=int)
     args = parser.parse_args()

     data = torch.rand(10, 1000)
     target = torch.rand(10)
     dataset = torch.utils.data.TensorDataset(data, target)
     data_loader = torch.utils.data.DataLoader(dataset,batch_size=2, num_workers=args.num_workers)
     with torch.autograd.profiler.profile(use_cuda=True) as prof:
         for i, batch in enumerate(data_loader):
             pass
     print(prof)

Running the script via:

python test.py --num-workers 0

works fine, while

python test.py --num-workers 1

can get the following error:

Traceback (most recent call last):
  File "my_test.py", line 15, in <module>
    for i, batch in enumerate(data_loader):
  File "/home/xiaobinz/anaconda3/envs/pytorch-gpu/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 637, in __next__
    return self._process_next_batch(batch)
  File "/home/xiaobinz/anaconda3/envs/pytorch-gpu/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 658, in _process_next_batch
    raise batch.exc_type(batch.exc_msg)
RuntimeError: Traceback (most recent call last):
  File "/home/xiaobinz/anaconda3/envs/pytorch-gpu/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 138, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/home/xiaobinz/anaconda3/envs/pytorch-gpu/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 138, in <listcomp>
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/home/xiaobinz/anaconda3/envs/pytorch-gpu/lib/python3.6/site-packages/torch/utils/data/dataset.py", line 40, in __getitem__
    return tuple(tensor[index] for tensor in self.tensors)
  File "/home/xiaobinz/anaconda3/envs/pytorch-gpu/lib/python3.6/site-packages/torch/utils/data/dataset.py", line 40, in <genexpr>
    return tuple(tensor[index] for tensor in self.tensors)
RuntimeError: /opt/conda/conda-bld/pytorch_1549628766161/work/torch/csrc/autograd/profiler.h:81: initialization error

Torch version is

pytorch                   1.0.1           py3.6_cuda9.0.176_cudnn7.4.2_2    pytorch

Perhaps they are same issue, thanks!

rgommers · 2019-07-22T23:43:44Z

This is still an issue; the failure mode changed after the significant rewrite of dataloader.py in #19228:

    _error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 15030) is killed by signal: Aborted.
...
RuntimeError: DataLoader worker (pid(s) 15030) exited unexpectedly

Full traceback:

$ python -m torch.utils.bottleneck -- torch_6313.py --num-workers 1
`bottleneck` is a tool that can be used as an initial step for debugging
bottlenecks in your program.

It summarizes runs of your script with the Python profiler and PyTorch's
autograd profiler. Because your script will be profiled, please ensure that it
exits in a finite amount of time.

For more complicated uses of the profilers, please see
https://docs.python.org/3/library/profile.html and
https://pytorch.org/docs/master/autograd.html#profiler for more information.
Running environment analysis...
Running your script with cProfile
Running your script with the autograd profiler...
terminate called after throwing an instance of 'std::runtime_error'
  what():  /opt/conda/conda-bld/pytorch-nightly_1563772086834/work/torch/csrc/autograd/profiler_cuda.cpp:22: initialization error
Traceback (most recent call last):
  File "/home/rgommers/anaconda3/envs/pytorch-nightly/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 683, in _try_get_data
    data = self.data_queue.get(timeout=timeout)
  File "/home/rgommers/anaconda3/envs/pytorch-nightly/lib/python3.7/multiprocessing/queues.py", line 104, in get
    if not self._poll(timeout):
  File "/home/rgommers/anaconda3/envs/pytorch-nightly/lib/python3.7/multiprocessing/connection.py", line 257, in poll
    return self._poll(timeout)
  File "/home/rgommers/anaconda3/envs/pytorch-nightly/lib/python3.7/multiprocessing/connection.py", line 414, in _poll
    r = wait([self], timeout)
  File "/home/rgommers/anaconda3/envs/pytorch-nightly/lib/python3.7/multiprocessing/connection.py", line 920, in wait
    ready = selector.select(timeout)
  File "/home/rgommers/anaconda3/envs/pytorch-nightly/lib/python3.7/selectors.py", line 415, in select
    fd_event_list = self._selector.poll(timeout)
  File "/home/rgommers/anaconda3/envs/pytorch-nightly/lib/python3.7/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler
    _error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 15030) is killed by signal: Aborted. 

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/rgommers/anaconda3/envs/pytorch-nightly/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/rgommers/anaconda3/envs/pytorch-nightly/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/rgommers/anaconda3/envs/pytorch-nightly/lib/python3.7/site-packages/torch/utils/bottleneck/__main__.py", line 231, in <module>
    main()
  File "/home/rgommers/anaconda3/envs/pytorch-nightly/lib/python3.7/site-packages/torch/utils/bottleneck/__main__.py", line 210, in main
    autograd_prof_cpu, autograd_prof_cuda = run_autograd_prof(code, globs)
  File "/home/rgommers/anaconda3/envs/pytorch-nightly/lib/python3.7/site-packages/torch/utils/bottleneck/__main__.py", line 104, in run_autograd_prof
    result.append(run_prof(use_cuda=True))
  File "/home/rgommers/anaconda3/envs/pytorch-nightly/lib/python3.7/site-packages/torch/utils/bottleneck/__main__.py", line 98, in run_prof
    exec(code, globs, None)
  File "torch_6313.py", line 15, in <module>
    for i, batch in enumerate(data_loader):
  File "/home/rgommers/anaconda3/envs/pytorch-nightly/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 763, in __next__
    idx, data = self._get_data()
  File "/home/rgommers/anaconda3/envs/pytorch-nightly/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 730, in _get_data
    success, data = self._try_get_data()
  File "/home/rgommers/anaconda3/envs/pytorch-nightly/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 696, in _try_get_data
    raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str))
RuntimeError: DataLoader worker (pid(s) 15030) exited unexpectedly

hongliny · 2019-08-13T21:12:50Z

Still an issue as of v1.2

zou3519 · 2019-10-09T20:09:21Z

So the root problem is the following:

DataLoader uses fork to create a new process
the autograd profiler in CUDA mode initializes CUDA. Bottleneck uses the CUDA mode autograd profiler if it is available
the CUDA API does not support fork-ing after CUDA has been initialized.

I'm not sure what the best way to resolve this is. From the user side, one can change the multiprocessing method to spawn (instead of fork), but this might have different performance characteristics. Here are some potential action items:

Give a nice error message. Something like: "autograd profiler in CUDA mode is not supported with dataloader"
figure out some way to benchmark data loading.

Mitigates #6313 A common use case for the autograd profiler is to use it to run over an entire model, including dataloading. The following will crash: - run autograd profiler in CUDA mode - Use a multi-worker DataLoader (presumably with the 'fork' spawn method) because the autograd profiler initializes CUDA and forking after CUDA is initialized is bad. This PR puts in a nice error message when this happens so that users aren't too confused. The new error message looks like: ``` terminate called after throwing an instance of 'std::runtime_error' what(): ../torch/csrc/autograd/profiler_cuda.cpp:36: CUDA initialization error. This can occur if one ru ns the profiler in CUDA mode on code that creates a DataLoader with num_workers > 0. This operation is curr ently unsupported; potential workarounds are: (1) don't use the profiler in CUDA mode or (2) use num_worker s=0 in the DataLoader or (3) Don't profile the data loading portion of your code. https://github.com/pytorc h/pytorch/issues/6313 tracks profiler support for multi-worker DataLoader. Traceback (most recent call last): File "/scratch/rzou/pt/workspace/torch/utils/data/dataloader.py", line 761, in _try_get_data data = self._data_queue.get(timeout=timeout) File "/scratch/rzou/pt/workspace-env/lib/python3.7/multiprocessing/queues.py", line 104, in get if not self._poll(timeout): File "/scratch/rzou/pt/workspace-env/lib/python3.7/multiprocessing/connection.py", line 257, in poll return self._poll(timeout) File "/scratch/rzou/pt/workspace-env/lib/python3.7/multiprocessing/connection.py", line 414, in _poll r = wait([self], timeout) File "/scratch/rzou/pt/workspace-env/lib/python3.7/multiprocessing/connection.py", line 920, in wait ready = selector.select(timeout) File "/scratch/rzou/pt/workspace-env/lib/python3.7/selectors.py", line 415, in select fd_event_list = self._selector.poll(timeout) File "/scratch/rzou/pt/workspace/torch/utils/data/_utils/signal_handling.py", line 66, in handler _error_if_any_worker_fails() RuntimeError: DataLoader worker (pid 18703) is killed by signal: Aborted. During handling of the above exception, another exception occurred: Traceback (most recent call last): File "mwe.py", line 15, in <module> for i, batch in enumerate(data_loader): File "/scratch/rzou/pt/workspace/torch/utils/data/dataloader.py", line 345, in __next__ data = self._next_data() File "/scratch/rzou/pt/workspace/torch/utils/data/dataloader.py", line 841, in _next_data idx, data = self._get_data() File "/scratch/rzou/pt/workspace/torch/utils/data/dataloader.py", line 808, in _get_data success, data = self._try_get_data() File "/scratch/rzou/pt/workspace/torch/utils/data/dataloader.py", line 774, in _try_get_data raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str)) RuntimeError: DataLoader worker (pid(s) 18703) exited unexpectedly ``` Test Plan: - Tested locally - It's hard to add a test for this because the program is supposed to fail. [ghstack-poisoned]

Mitigates #6313 A common use case for the autograd profiler is to use it to run over an entire model, including dataloading. The following will crash: - run autograd profiler in CUDA mode - Use a multi-worker DataLoader (presumably with the 'fork' spawn method) because the autograd profiler initializes CUDA and forking after CUDA is initialized is bad. This PR puts in a nice error message when this happens so that users aren't too confused. The new error message looks like: ``` terminate called after throwing an instance of 'std::runtime_error' what(): ../torch/csrc/autograd/profiler_cuda.cpp:36: CUDA initialization error. This can occur if one ru ns the profiler in CUDA mode on code that creates a DataLoader with num_workers > 0. This operation is curr ently unsupported; potential workarounds are: (1) don't use the profiler in CUDA mode or (2) use num_worker s=0 in the DataLoader or (3) Don't profile the data loading portion of your code. https://github.com/pytorc h/pytorch/issues/6313 tracks profiler support for multi-worker DataLoader. Traceback (most recent call last): File "/scratch/rzou/pt/workspace/torch/utils/data/dataloader.py", line 761, in _try_get_data data = self._data_queue.get(timeout=timeout) File "/scratch/rzou/pt/workspace-env/lib/python3.7/multiprocessing/queues.py", line 104, in get if not self._poll(timeout): File "/scratch/rzou/pt/workspace-env/lib/python3.7/multiprocessing/connection.py", line 257, in poll return self._poll(timeout) File "/scratch/rzou/pt/workspace-env/lib/python3.7/multiprocessing/connection.py", line 414, in _poll r = wait([self], timeout) File "/scratch/rzou/pt/workspace-env/lib/python3.7/multiprocessing/connection.py", line 920, in wait ready = selector.select(timeout) File "/scratch/rzou/pt/workspace-env/lib/python3.7/selectors.py", line 415, in select fd_event_list = self._selector.poll(timeout) File "/scratch/rzou/pt/workspace/torch/utils/data/_utils/signal_handling.py", line 66, in handler _error_if_any_worker_fails() RuntimeError: DataLoader worker (pid 18703) is killed by signal: Aborted. During handling of the above exception, another exception occurred: Traceback (most recent call last): File "mwe.py", line 15, in <module> for i, batch in enumerate(data_loader): File "/scratch/rzou/pt/workspace/torch/utils/data/dataloader.py", line 345, in __next__ data = self._next_data() File "/scratch/rzou/pt/workspace/torch/utils/data/dataloader.py", line 841, in _next_data idx, data = self._get_data() File "/scratch/rzou/pt/workspace/torch/utils/data/dataloader.py", line 808, in _get_data success, data = self._try_get_data() File "/scratch/rzou/pt/workspace/torch/utils/data/dataloader.py", line 774, in _try_get_data raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str)) RuntimeError: DataLoader worker (pid(s) 18703) exited unexpectedly ``` Test Plan: - Tested locally - It's hard to add a test for this because the program is supposed to fail. ghstack-source-id: a59340bd942860fe135ae418dab5d9e88726b043 Pull Request resolved: #31445

Mitigates #6313 A common use case for the autograd profiler is to use it to run over an entire model, including dataloading. The following will crash: - run autograd profiler in CUDA mode - Use a multi-worker DataLoader (presumably with the 'fork' spawn method) - because the autograd profiler initializes CUDA and forking after CUDA is initialized is bad. This PR puts in a nice error message when this happens so that users aren't too confused. The new error message looks like: https://gist.github.com/zou3519/903f15c3e86bad4585b7e5ce14cc1b70 Test Plan: - Tested locally. - I didn't add a test case for this because it's hard to write a test case that doesn't completely stop the rest of our test suite from running. [ghstack-poisoned]

…loader crash" Mitigates #6313 A common use case for the autograd profiler is to use it to run over an entire model, including dataloading. The following will crash: - run autograd profiler in CUDA mode - Use a multi-worker DataLoader (presumably with the 'fork' spawn method) - because the autograd profiler initializes CUDA and forking after CUDA is initialized is bad. This PR puts in a nice error message when this happens so that users aren't too confused. The new error message looks like: https://gist.github.com/zou3519/903f15c3e86bad4585b7e5ce14cc1b70 Test Plan: - Tested locally. - I didn't add a test case for this because it's hard to write a test case that doesn't completely stop the rest of our test suite from running. Differential Revision: [D19178080](https://our.internmc.facebook.com/intern/diff/D19178080) [ghstack-poisoned]

Mitigates #6313 A common use case for the autograd profiler is to use it to run over an entire model, including dataloading. The following will crash: - run autograd profiler in CUDA mode - Use a multi-worker DataLoader (presumably with the 'fork' spawn method) - because the autograd profiler initializes CUDA and forking after CUDA is initialized is bad. This PR puts in a nice error message when this happens so that users aren't too confused. The new error message looks like: https://gist.github.com/zou3519/903f15c3e86bad4585b7e5ce14cc1b70 Test Plan: - Tested locally. - I didn't add a test case for this because it's hard to write a test case that doesn't completely stop the rest of our test suite from running. ghstack-source-id: 0d0e42674f2c7e3324abdbe4ccc797ba827e1ab6 Pull Request resolved: #31473

#31473) Summary: Pull Request resolved: #31473 Mitigates #6313 A common use case for the autograd profiler is to use it to run over an entire model, including dataloading. The following will crash: - run autograd profiler in CUDA mode - Use a multi-worker DataLoader (presumably with the 'fork' spawn method) - because the autograd profiler initializes CUDA and forking after CUDA is initialized is bad. This PR puts in a nice error message when this happens so that users aren't too confused. The new error message looks like: https://gist.github.com/zou3519/903f15c3e86bad4585b7e5ce14cc1b70 Test Plan: - Tested locally. - I didn't add a test case for this because it's hard to write a test case that doesn't completely stop the rest of our test suite from running. Differential Revision: D19178080 Pulled By: zou3519 fbshipit-source-id: c632525ba1f7b168324f1aa55416e5250f56a086

zou3519 · 2019-12-20T18:45:35Z

FWIW, nvprof also has the restriction that it can't profile a process that forks: https://docs.nvidia.com/cuda/profiler-users-guide/index.html#multiprocess-profiling. I'm not sure what actually happens in this case.

Futhermore, the design of the autograd profiler requires it to initialize CUDA; otherwise, it is difficult to accurately create a timeline.

Regarding torch.utils.bottleneck, we should do at least one of the following

Switch to CPU-only profiling as the default
Document that we don't support multi-process profiling
Add some graceful error handling so that bottleneck doesn't crash when doing multi-process profiling.

pytorch#31473) Summary: Pull Request resolved: pytorch#31473 Mitigates pytorch#6313 A common use case for the autograd profiler is to use it to run over an entire model, including dataloading. The following will crash: - run autograd profiler in CUDA mode - Use a multi-worker DataLoader (presumably with the 'fork' spawn method) - because the autograd profiler initializes CUDA and forking after CUDA is initialized is bad. This PR puts in a nice error message when this happens so that users aren't too confused. The new error message looks like: https://gist.github.com/zou3519/903f15c3e86bad4585b7e5ce14cc1b70 Test Plan: - Tested locally. - I didn't add a test case for this because it's hard to write a test case that doesn't completely stop the rest of our test suite from running. Differential Revision: D19178080 Pulled By: zou3519 fbshipit-source-id: c632525ba1f7b168324f1aa55416e5250f56a086

Baranowski · 2020-04-14T16:28:52Z

AFAICT zou3517 has implemented 2. and 3. from #6313 (comment) in the merged PR #31473. Should this issue be closed, or is there still something left to do?

ezyang · 2020-04-21T14:53:30Z

cc @zou3519

zou3519 · 2020-04-21T19:53:57Z

@Baranowski: Yeah, it looks like we've done 2&3.

For 2, we missed adding an entry to the bottleneck docs: https://pytorch.org/docs/stable/bottleneck.html . So that remains to be done.

For action item 1, I'm not sure if it is a good idea to switch to CPU-only profiling for bottleneck. That is a separate issue though; to finish resolving the issue of this crash, we should add a line to the bottleneck docs about the limitation.

makslevental · 2020-04-21T20:13:44Z

how do you change from fork to spawn in bottleneck? i'm having this issue right now

edit: welp that doesn't solve the problem because i can't pickle Lambda transforms.

ngimel · 2022-02-07T18:33:15Z

closing due to age.

fmassa added the bug label Apr 5, 2018

fmassa assigned zou3519 Apr 5, 2018

peterjc123 mentioned this issue Apr 6, 2018

[WIP] Enable tests that use DataLoader with multiple workers on Windows #6096

Closed

zou3519 added the todo Not as important as medium or high priority tasks, but we will work on these. label May 14, 2018

sakaia mentioned this issue Jul 25, 2018

Example code (mnist) does not work under torch.utils.bottleneck pytorch/examples#391

Open

zou3519 removed their assignment Mar 12, 2019

andfoy self-assigned this Jul 25, 2019

rgommers unassigned andfoy Aug 14, 2019

zou3519 self-assigned this Sep 13, 2019

zou3519 mentioned this issue Dec 18, 2019

Better error msg for autograd profiler + multi-worker dataloader crash #31445

Closed

zou3519 mentioned this issue Dec 19, 2019

Better error msg for autograd profiler + multi-worker dataloader crash #31473

Closed

Maghoumi mentioned this issue Feb 18, 2020

[utils.bottleneck] throws initialization error for cuda profiling #21016

Open

Baranowski added the quansight-nack High-prio issues that have been reviewed by Quansight and are judged to be not actionable. label Apr 21, 2020

sachit-menon mentioned this issue Sep 7, 2020

add autograd profiler Lightning-AI/pytorch-lightning#1693

Closed

5 tasks

ngimel removed the high priority label Feb 7, 2022

ngimel closed this as completed Feb 7, 2022

cpuhrsch mentioned this issue Jul 8, 2022

torch.utils.bottleneck spams output and crashes #81026

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[utils.bottleneck] Bottleneck crashes with multi-threaded data loader #6313

[utils.bottleneck] Bottleneck crashes with multi-threaded data loader #6313

fmassa commented Apr 5, 2018 •

edited by pytorch-probot bot

Loading

zou3519 commented Apr 5, 2018

TiRune commented Aug 23, 2018

jhagege commented Nov 4, 2018

stmharry commented Dec 28, 2018

ezyang commented Jan 3, 2019

dcela commented Mar 23, 2019

yaceben commented Apr 2, 2019

XiaobingSuper commented Apr 3, 2019

rgommers commented Jul 22, 2019

hongliny commented Aug 13, 2019

zou3519 commented Oct 9, 2019

zou3519 commented Dec 20, 2019 •

edited

Loading

Baranowski commented Apr 14, 2020

ezyang commented Apr 21, 2020

zou3519 commented Apr 21, 2020

makslevental commented Apr 21, 2020 •

edited

Loading

ngimel commented Feb 7, 2022

[utils.bottleneck] Bottleneck crashes with multi-threaded data loader #6313

[utils.bottleneck] Bottleneck crashes with multi-threaded data loader #6313

Comments

fmassa commented Apr 5, 2018 • edited by pytorch-probot bot Loading

zou3519 commented Apr 5, 2018

TiRune commented Aug 23, 2018

jhagege commented Nov 4, 2018

stmharry commented Dec 28, 2018

ezyang commented Jan 3, 2019

dcela commented Mar 23, 2019

yaceben commented Apr 2, 2019

XiaobingSuper commented Apr 3, 2019

rgommers commented Jul 22, 2019

hongliny commented Aug 13, 2019

zou3519 commented Oct 9, 2019

zou3519 commented Dec 20, 2019 • edited Loading

Baranowski commented Apr 14, 2020

ezyang commented Apr 21, 2020

zou3519 commented Apr 21, 2020

makslevental commented Apr 21, 2020 • edited Loading

ngimel commented Feb 7, 2022

fmassa commented Apr 5, 2018 •

edited by pytorch-probot bot

Loading

zou3519 commented Dec 20, 2019 •

edited

Loading

makslevental commented Apr 21, 2020 •

edited

Loading