Why CRF lead a high cost on CPU? #2884

wlhgtc · 2019-05-24T07:51:00Z

I train two mode to do NER task:

1 layer Bi-LSTM (crf_tagger)
1 layer Bi-LSTM+CRF (sample_tagger)

I compare their cost on both CPU and GPU in training process.
Here are the result:

Model	CPU(% )	GPU(%)
Bi-LSTM+CRF(AllenNLP)	1124	1
Bi-LSTM(AllenNLP)	57	6
Bi-LSTM+CRF(Keras)	235	13

It's a Linux server with 16 logical core (4 physical) single Tesla V100.
I use top->%cpu to get CPU cost and nvidia-smi ->Volatile GPU Util for GPU.
Intuitively, seems that viterbi_decode lead this phenomenon , I read the source code but can't find how to solve it.
Can somebody help me ?

The text was updated successfully, but these errors were encountered:

joelgrus · 2019-05-24T22:06:59Z

Sorry, what is the problem you're trying to solve? Is the training too slow?

wlhgtc · 2019-05-25T11:23:49Z

@joelgrus Thanks for your reply.
Though the high CPU cost does makes the training slow,I wonder why the CRF module lead a high CPU cost ? And how can I avoid it ?
As you can see, it's about 5 times higher than same structure in Keras. It doesn't make sense.

kl2806 · 2019-06-07T22:16:38Z

We don't know the root cause, contributions welcome

wy-ei · 2019-06-09T05:25:33Z

class allennlp.modules.conditional_random_field.ConditionalRandomField(num_tags: int,
    constraints: List[Tuple[int, int]] = None, include_start_end_transitions: bool = True)

Make sure you have provied the constraints parameter, which can dramatically speed up the training process by reducing the number of possible decode path.

Maybewuss · 2020-02-06T06:11:50Z

The function _viterbi_decode in ConditionalRandomField define many tensors in cpu instead of device which inputs in, thus the decoding process is opreating in cpu most time. I think so.

natny · 2020-09-09T22:13:21Z

It appears that the init method of ConditionalRandomFields allocates the initial self.transitions on the cpu. By the time the _joint_likelihood method is called, these are on the gpu. However, if the CRF is in a container, this doesn't seem to happen and I get "RuntimeError: expected backend CUDA and dtype Float but got backend CPU and dtype Float"

epwalsh · 2020-09-09T23:46:31Z

@natny your issue sounds like it deserves it's own bug report. Could you please open one: https://github.com/allenai/allennlp/issues/new?assignees=&labels=bug&template=bug_report.md&title=

natny · 2020-09-13T14:39:45Z

Hi @epwalsh, thank you - with regard to part 2 of the issue, I think it is more of a pytorch thing - when I move the container to a ModuleDict, behaviour is as expected - i.e., all transitions get moved to the GPU when they are supposed to. So I'm not sure it warrants a bug report?

epwalsh · 2020-09-14T16:07:12Z

Hey @natny, what do you mean by "move the container to a ModuleDict"?

oroszgy · 2021-07-27T20:21:14Z

Hi there! I faced the same issue, and managed to resolve it by moving transitions and all the parameters to the proper device. I'm sending a PR soon.

kl2806 added the Contributions welcome label Jun 7, 2019

This was referenced Jul 27, 2021

Bugfix: initializing all tensors and parameters of the ConditionalRandomField model on the proper device #5335

Merged

Bugfix: initializing all tensors and parameters of the ConditionalRandomField on the proper device allenai/allennlp-models#292

Closed

dirkgr closed this as completed in #5335 Aug 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why CRF lead a high cost on CPU? #2884

Why CRF lead a high cost on CPU? #2884

wlhgtc commented May 24, 2019 •

edited

Loading

joelgrus commented May 24, 2019

wlhgtc commented May 25, 2019 •

edited

Loading

kl2806 commented Jun 7, 2019

wy-ei commented Jun 9, 2019

Maybewuss commented Feb 6, 2020

natny commented Sep 9, 2020

epwalsh commented Sep 9, 2020

natny commented Sep 13, 2020

epwalsh commented Sep 14, 2020

oroszgy commented Jul 27, 2021

Why CRF lead a high cost on CPU? #2884

Why CRF lead a high cost on CPU? #2884

Comments

wlhgtc commented May 24, 2019 • edited Loading

joelgrus commented May 24, 2019

wlhgtc commented May 25, 2019 • edited Loading

kl2806 commented Jun 7, 2019

wy-ei commented Jun 9, 2019

Maybewuss commented Feb 6, 2020

natny commented Sep 9, 2020

epwalsh commented Sep 9, 2020

natny commented Sep 13, 2020

epwalsh commented Sep 14, 2020

oroszgy commented Jul 27, 2021

wlhgtc commented May 24, 2019 •

edited

Loading

wlhgtc commented May 25, 2019 •

edited

Loading