Use multiple CPU cores during decoding #30

smiranda · 2020-10-16T12:24:37Z

Hello, I'm trying to use multiple cpu cores in decoding. I added the "cpu-threads: 8" to the decoder.yml, as per marian documentation.

This seems to recognize 8 cpus in loading time.

pus-mt_1 | [2020-10-16 12:08:54] [memory] Extending reserved space to 512 MB (device cpu0) opus-mt_1 | [2020-10-16 12:08:54] Loading scorer of type transformer as feature F0 opus-mt_1 | [2020-10-16 12:08:54] Loading model from /usr/src/app/models/en-es/opus.bpe32k-bpe32k.transformer.model1.npz.best-perplexity.npz opus-mt_1 | [2020-10-16 12:08:54] [memory] Extending reserved space to 512 MB (device cpu0) opus-mt_1 | [2020-10-16 12:08:54] Loading scorer of type transformer as feature F0 opus-mt_1 | [2020-10-16 12:08:54] Loading model from /usr/src/app/models/ar-en/opus.bpe32k-bpe32k.transformer.model1.npz.best-perplexity.npz opus-mt_1 | [2020-10-16 12:08:58] Server is listening on port 10001 opus-mt_1 | [2020-10-16 12:09:04] [memory] Extending reserved space to 512 MB (device cpu1) opus-mt_1 | [2020-10-16 12:09:04] Loading scorer of type transformer as feature F0 opus-mt_1 | [2020-10-16 12:09:04] Loading model from /usr/src/app/models/ar-en/opus.bpe32k-bpe32k.transformer.model1.npz.best-perplexity.npz opus-mt_1 | [2020-10-16 12:09:08] [memory] Extending reserved space to 512 MB (device cpu2) opus-mt_1 | [2020-10-16 12:09:08] Loading scorer of type transformer as feature F0 opus-mt_1 | [2020-10-16 12:09:08] Loading model from /usr/src/app/models/ar-en/opus.bpe32k-bpe32k.transformer.model1.npz.best-perplexity.npz opus-mt_1 | [2020-10-16 12:09:11] [memory] Extending reserved space to 512 MB (device cpu3) opus-mt_1 | [2020-10-16 12:09:12] Loading scorer of type transformer as feature F0 opus-mt_1 | [2020-10-16 12:09:12] Loading model from /usr/src/app/models/ar-en/opus.bpe32k-bpe32k.transformer.model1.npz.best-perplexity.npz opus-mt_1 | [2020-10-16 12:09:18] [memory] Extending reserved space to 512 MB (device cpu4) opus-mt_1 | [2020-10-16 12:09:19] Loading scorer of type transformer as feature F0 opus-mt_1 | [2020-10-16 12:09:19] Loading model from /usr/src/app/models/ar-en/opus.bpe32k-bpe32k.transformer.model1.npz.best-perplexity.npz opus-mt_1 | [2020-10-16 12:09:22] [memory] Extending reserved space to 512 MB (device cpu5) opus-mt_1 | [2020-10-16 12:09:22] Loading scorer of type transformer as feature F0 opus-mt_1 | [2020-10-16 12:09:22] Loading model from /usr/src/app/models/ar-en/opus.bpe32k-bpe32k.transformer.model1.npz.best-perplexity.npz opus-mt_1 | [2020-10-16 12:09:25] [memory] Extending reserved space to 512 MB (device cpu6) opus-mt_1 | [2020-10-16 12:09:26] Loading scorer of type transformer as feature F0 opus-mt_1 | [2020-10-16 12:09:26] Loading model from /usr/src/app/models/ar-en/opus.bpe32k-bpe32k.transformer.model1.npz.best-perplexity.npz opus-mt_1 | [2020-10-16 12:09:29] [memory] Extending reserved space to 512 MB (device cpu7) opus-mt_1 | [2020-10-16 12:09:29] Loading scorer of type transformer as feature F0 opus-mt_1 | [2020-10-16 12:09:29] Loading model from /usr/src/app/models/ar-en/opus.bpe32k-bpe32k.transformer.model1.npz.best-perplexity.npz opus-mt_1 | [2020-10-16 12:09:32] Server is listening on port 10002

But then in execution time I see it only uses 1 CPU and takes the same time as without the cpu-threads:8 config. It also just prints:

opus-mt_1 | [2020-10-16 12:10:30] [memory] Reserving 295 MB, device cpu0

Does any one know how to use multiple CPUs in decoding ?

Thanks.

The text was updated successfully, but these errors were encountered:

jorgtied · 2020-11-25T22:07:21Z

Is this with the tornado web app or the other OPUS-MT server? It may be related to translation of batches that is not well supported so far.

smiranda · 2020-12-04T09:35:58Z

@jorgtied Hello, I'm using the provided Dockerfile which launches CMD python3 server.py -c services.json -p 80. I think this means I'm using the tornado server. Which one is the "other" OPUS-MT server ? Is there a way to change the Dockerfile to use that one instead of the default one ? Thank you.

jorgtied · 2020-12-10T12:54:07Z

Information about the other server option is here: https://github.com/Helsinki-NLP/Opus-MT/blob/master/doc/WebSocketServer.md

smiranda · 2020-12-12T12:40:59Z

Thank you for your help.

cd Opus-MT/install
make all

This ran without errors, it seems. But the next command sudo make install said install: cannot stat 'marian/build/marian-server': No such file or directory, and in fact that file is not there, although marian, marian-decoder, etc, are there. My machine is a Ubuntu 16.04.7 LTS.

Do you know why this might happen ?

Is there a Dockerfile for this server version I could use ?

jorgtied · 2020-12-21T11:50:50Z

This is strange. Could it be that marian does notr compile a server binary anymore in the latest versions? I need to check that. Cold you try to revert to an earlier version of MarianNMT when compiling the system?

smiranda · 2021-01-03T15:23:05Z

@jorgtied thanks for your help so far ! Just FYI, I might return to this issue but I don't have more time at the moment. Tweaking around marian compilation seems too dificult for me. Maybe later !

If it suits you best, you can close the issue.

jorgtied · 2021-01-14T22:25:49Z

@smiranda I made a change in the installation makefile. Does it work now and does it compile the marian-server binary?

smiranda · 2021-01-19T22:34:57Z

@jorgtied Hello I was now able to install and run this service ! Thank you.

I still have the multi-core issue: It only uses 1 core even when processing a large text (a news item, several sentences). Is there somewhere I must configure multi-core ? I tried on the init.d file for the server in the marian command line option --cpu-threads 4. Please let me know if you have been able before to see multicore cpu activity.

I also have another observation, we're only supposed to pass 1 sentence here ? It seems so since the output is much smaller than input for a large text. In the other server, the docker http one, you can pass a large text and it does sentence splitting inside. Is this one supposed to be used differently ?

GermainZ · 2021-01-22T17:58:37Z

I see similar behavior here. The --cpu-threads option seemingly has no effect on CPU usage or translation times (I tried using 1 thread and up to 16).

A workaround is to run multiple instances of marian-server and route requests between them, which is what I ended up doing for now, but that requires a lot more work.
So for example, instead of using marian-server … --cpu-threads 16, I am running 16 instances of marian-server … --cpu-threads 1 and sending requests to these 16 instances (without caring about proper balancing for now).
This indeed results in higher CPU usage across cores, and better translation times.

Is this normal or am I missing something here? Thanks!

smiranda changed the title ~~Use multiple CPU cores~~ Use multiple CPU cores during decoding Oct 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use multiple CPU cores during decoding #30

Use multiple CPU cores during decoding #30

smiranda commented Oct 16, 2020

jorgtied commented Nov 25, 2020

smiranda commented Dec 4, 2020

jorgtied commented Dec 10, 2020

smiranda commented Dec 12, 2020

jorgtied commented Dec 21, 2020

smiranda commented Jan 3, 2021

jorgtied commented Jan 14, 2021

smiranda commented Jan 19, 2021 •

edited

Loading

GermainZ commented Jan 22, 2021

Use multiple CPU cores during decoding #30

Use multiple CPU cores during decoding #30

Comments

smiranda commented Oct 16, 2020

jorgtied commented Nov 25, 2020

smiranda commented Dec 4, 2020

jorgtied commented Dec 10, 2020

smiranda commented Dec 12, 2020

jorgtied commented Dec 21, 2020

smiranda commented Jan 3, 2021

jorgtied commented Jan 14, 2021

smiranda commented Jan 19, 2021 • edited Loading

GermainZ commented Jan 22, 2021

smiranda commented Jan 19, 2021 •

edited

Loading