Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable sm_35 support in pytorch #129

Open
CRCinAU opened this issue Dec 15, 2021 · 2 comments
Open

Enable sm_35 support in pytorch #129

CRCinAU opened this issue Dec 15, 2021 · 2 comments

Comments

@CRCinAU
Copy link

CRCinAU commented Dec 15, 2021

I'm trying to get GPU processing working on my older 2Gb GeForce GT 710 - which I believe should be about as fast as a jetson nano...

When I try to run deepstack with GPU enabled, I get:

/usr/local/lib/python3.7/dist-packages/torch/cuda/__init__.py:125: UserWarning: 
NVIDIA GeForce GT 710 with CUDA capability sm_35 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70 sm_75.
If you want to use the NVIDIA GeForce GT 710 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

  warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
Process Process-1:
Traceback (most recent call last):
  File "/usr/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/usr/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/app/intelligencelayer/shared/detection.py", line 69, in objectdetection
    detector = YOLODetector(model_path, reso, cuda=CUDA_MODE)
  File "/app/intelligencelayer/shared/./process.py", line 36, in __init__
    self.model = attempt_load(model_path, map_location=self.device)
  File "/app/intelligencelayer/shared/./models/experimental.py", line 159, in attempt_load
    torch.load(w, map_location=map_location)["model"].float().fuse().eval()
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 485, in float
    return self._apply(lambda t: t.float() if t.is_floating_point() else t)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 354, in _apply
    module._apply(fn)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 354, in _apply
    module._apply(fn)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 354, in _apply
    module._apply(fn)
  [Previous line repeated 1 more time]
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 376, in _apply
    param_applied = fn(param)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 485, in <lambda>
    return self._apply(lambda t: t.float() if t.is_floating_point() else t)
RuntimeError: CUDA error: no kernel image is available for execution on the device

Is there a way to enable sm_35 support in pytorch used in these containers? I can't quite see where it gets set....

Here's the nvidia-smi output from within the container:

root@7ea2aa82ebb1:/app/logs# nvidia-smi 
Wed Dec 15 09:44:56 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.86       Driver Version: 470.86       CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0 N/A |                  N/A |
| 33%   37C    P0    N/A /  N/A |      0MiB /  2002MiB |     N/A      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Supported versions seem to be:

root@7ea2aa82ebb1:~# python3 -c "import torch; print(torch.cuda.get_arch_list())"
['sm_37', 'sm_50', 'sm_60', 'sm_70', 'sm_75']
@CRCinAU
Copy link
Author

CRCinAU commented Dec 15, 2021

I've been testing more and more - and with this docker-compose.yaml file, tensorflow detects the GPU ok:

services:
  test:
    image: tensorflow/tensorflow:latest-gpu
    command: python -c "import tensorflow as tf;tf.test.gpu_device_name()"
    deploy:
      resources:
        reservations:
          devices:
            - capabilities: [gpu]

Output:

Creating tensor_test_1 ... done
Attaching to tensor_test_1
test_1  | 2021-12-15 12:53:15.611667: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
test_1  | To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
test_1  | 2021-12-15 12:53:15.632027: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
test_1  | 2021-12-15 12:53:15.651586: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
test_1  | 2021-12-15 12:53:15.651933: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
test_1  | 2021-12-15 12:53:16.198873: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
test_1  | 2021-12-15 12:53:16.199203: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
test_1  | 2021-12-15 12:53:16.199453: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
test_1  | 2021-12-15 12:53:16.201678: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /device:GPU:0 with 1672 MB memory:  -> device: 0, name: NVIDIA GeForce GT 710, pci bus id: 0000:01:00.0, compute capability: 3.5
tensor_test_1 exited with code 0

@CRCinAU
Copy link
Author

CRCinAU commented Dec 16, 2021

I think I managed to get this working!

Firstly, this is on Ubuntu 20.04 - and we need to install python 3.7 - and do this all as the root user:

# add-apt-repository ppa:deadsnakes/ppa
# apt-get update
# apt-get install python3.7

Create a python 3.7 venv and activate it:

# python3.7 -m venv /root/python-3.7
# cd /root/python-3.7
# source bin/activate

Now we want to install a proper version of torch that includes the stuff we need. I chose the same version of torch that's used in the deepstack install:

# pip install torch==1.6.0+cu101 -f https://nelsonliu.me/files/pytorch/whl/torch_stable.html

Run deepstack and map in the alternative torch package:

# docker run --gpus all -e VISION-DETECTION=True -e VISION-FACE=True -v /root/python-3.7/lib/python3.7/site-packages/torch:/usr/local/lib/python3.7/dist-packages/torch -v localstorage:/datastore -p 5000:5000 deepquestai/deepstack:gpu

This will map in the alternative torch packages that in my case supports sm_35

Results:

[GIN] 2021/12/16 - 02:20:54 | 200 |  328.902025ms |     172.31.1.89 | POST     "/v1/vision/detection"
[GIN] 2021/12/16 - 02:21:06 | 200 |    225.5783ms |     172.31.1.89 | POST     "/v1/vision/detection"
[GIN] 2021/12/16 - 02:21:09 | 200 |  233.602927ms |     172.31.1.89 | POST     "/v1/vision/detection"

Compared to running on a Jetson Nano 4Gb:

[GIN] 2021/12/16 - 02:14:04 | 200 |  278.531116ms |     172.31.1.89 | POST     /v1/vision/detection
[GIN] 2021/12/16 - 02:14:06 | 200 |   292.32564ms |     172.31.1.89 | POST     /v1/vision/detection
[GIN] 2021/12/16 - 02:14:07 | 200 |  270.695522ms |     172.31.1.89 | POST     /v1/vision/detection

Output of nvidia-smi from within the deepstack container:

# docker exec -ti admiring_germain nvidia-smi
Thu Dec 16 02:33:00 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.82.01    Driver Version: 470.82.01    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:01:00.0 N/A |                  N/A |
| 33%   34C    P8    N/A /  N/A |   1178MiB /  2002MiB |     N/A      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant