Skip to content
This repository has been archived by the owner on Dec 16, 2022. It is now read-only.

Why CRF lead a high cost on CPU? #2884

Closed
wlhgtc opened this issue May 24, 2019 · 10 comments · Fixed by #5335
Closed

Why CRF lead a high cost on CPU? #2884

wlhgtc opened this issue May 24, 2019 · 10 comments · Fixed by #5335

Comments

@wlhgtc
Copy link
Contributor

wlhgtc commented May 24, 2019

I train two mode to do NER task:

  1. 1 layer Bi-LSTM (crf_tagger)
  2. 1 layer Bi-LSTM+CRF (sample_tagger)

I compare their cost on both CPU and GPU in training process.
Here are the result:

Model CPU(% ) GPU(%)
Bi-LSTM+CRF(AllenNLP) 1124 1
Bi-LSTM(AllenNLP) 57 6
Bi-LSTM+CRF(Keras) 235 13

It's a Linux server with 16 logical core (4 physical) single Tesla V100.
I use top->%cpu to get CPU cost and nvidia-smi ->Volatile GPU Util for GPU.
Intuitively, seems that viterbi_decode lead this phenomenon , I read the source code but can't find how to solve it.
Can somebody help me ?

@joelgrus
Copy link
Contributor

Sorry, what is the problem you're trying to solve? Is the training too slow?

@wlhgtc
Copy link
Contributor Author

wlhgtc commented May 25, 2019

@joelgrus Thanks for your reply.
Though the high CPU cost does makes the training slow,I wonder why the CRF module lead a high CPU cost ? And how can I avoid it ?
As you can see, it's about 5 times higher than same structure in Keras. It doesn't make sense.

@kl2806
Copy link
Contributor

kl2806 commented Jun 7, 2019

We don't know the root cause, contributions welcome

@wy-ei
Copy link

wy-ei commented Jun 9, 2019

class allennlp.modules.conditional_random_field.ConditionalRandomField(num_tags: int,
    constraints: List[Tuple[int, int]] = None, include_start_end_transitions: bool = True)

Make sure you have provied the constraints parameter, which can dramatically speed up the training process by reducing the number of possible decode path.

@Maybewuss
Copy link

The function _viterbi_decode in ConditionalRandomField define many tensors in cpu instead of device which inputs in, thus the decoding process is opreating in cpu most time. I think so.

@natny
Copy link

natny commented Sep 9, 2020

It appears that the init method of ConditionalRandomFields allocates the initial self.transitions on the cpu. By the time the _joint_likelihood method is called, these are on the gpu. However, if the CRF is in a container, this doesn't seem to happen and I get "RuntimeError: expected backend CUDA and dtype Float but got backend CPU and dtype Float"

@epwalsh
Copy link
Member

epwalsh commented Sep 9, 2020

@natny your issue sounds like it deserves it's own bug report. Could you please open one: https://github.com/allenai/allennlp/issues/new?assignees=&labels=bug&template=bug_report.md&title=

@natny
Copy link

natny commented Sep 13, 2020

Hi @epwalsh, thank you - with regard to part 2 of the issue, I think it is more of a pytorch thing - when I move the container to a ModuleDict, behaviour is as expected - i.e., all transitions get moved to the GPU when they are supposed to. So I'm not sure it warrants a bug report?

@epwalsh
Copy link
Member

epwalsh commented Sep 14, 2020

Hey @natny, what do you mean by "move the container to a ModuleDict"?

@oroszgy
Copy link
Contributor

oroszgy commented Jul 27, 2021

Hi there! I faced the same issue, and managed to resolve it by moving transitions and all the parameters to the proper device. I'm sending a PR soon.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.