training rel detector using multi gpus #37

wtliao · 2018-11-02T09:06:54Z

Hi,
I have successfully trained the detector using multiple gpus (8). But I have the following issue when training rel detector using more than one GPUs (have tried on 1080 ti, p100 and K40)

Traceback (most recent call last):
  File "/home/wtliao/work_space/neural-motifs-master-backup/models/train_rels.py", line 229, in <module>
    rez = train_epoch(epoch)
  File "/home/wtliao/work_space/neural-motifs-master-backup/models/train_rels.py", line 135, in train_epoch
    tr.append(train_batch(batch, verbose=b % (conf.print_interval*10) == 0)) #b == 0))
  File "/home/wtliao/work_space/neural-motifs-master-backup/models/train_rels.py", line 179, in train_batch
    loss.backward()
  File "/home/wtliao/anaconda2/envs/mofit/lib/python3.6/site-packages/torch/autograd/variable.py", line 167, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
  File "/home/wtliao/anaconda2/envs/mofit/lib/python3.6/site-packages/torch/autograd/__init__.py", line 99, in backward
    variables, grad_variables, retain_graph)
RuntimeError: narrow is not implemented for type UndefinedType

The code works well for single gpu. I have no idea about that at all and I cant find a sollution by google. Do you have any idea about that? Thanks

The text was updated successfully, but these errors were encountered:

rowanz · 2018-11-02T12:59:53Z

sorry, I don't support training the relationship model with multiple GPUs right now (it's not what I used for these experiments). I found it actually doesn't help too much in terms of speedup, as the LSTMs are kinda slow and hard to parallelize.

wtliao · 2018-11-05T03:13:58Z

Thanks. Get it. The issuse happens at backward of LSTM.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

training rel detector using multi gpus #37

training rel detector using multi gpus #37

wtliao commented Nov 2, 2018

rowanz commented Nov 2, 2018

wtliao commented Nov 5, 2018

training rel detector using multi gpus #37

training rel detector using multi gpus #37

Comments

wtliao commented Nov 2, 2018

rowanz commented Nov 2, 2018

wtliao commented Nov 5, 2018