Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use multiple CPU cores during decoding #30

Open
smiranda opened this issue Oct 16, 2020 · 9 comments
Open

Use multiple CPU cores during decoding #30

smiranda opened this issue Oct 16, 2020 · 9 comments

Comments

@smiranda
Copy link

Hello, I'm trying to use multiple cpu cores in decoding. I added the "cpu-threads: 8" to the decoder.yml, as per marian documentation.

This seems to recognize 8 cpus in loading time.

pus-mt_1 | [2020-10-16 12:08:54] [memory] Extending reserved space to 512 MB (device cpu0) opus-mt_1 | [2020-10-16 12:08:54] Loading scorer of type transformer as feature F0 opus-mt_1 | [2020-10-16 12:08:54] Loading model from /usr/src/app/models/en-es/opus.bpe32k-bpe32k.transformer.model1.npz.best-perplexity.npz opus-mt_1 | [2020-10-16 12:08:54] [memory] Extending reserved space to 512 MB (device cpu0) opus-mt_1 | [2020-10-16 12:08:54] Loading scorer of type transformer as feature F0 opus-mt_1 | [2020-10-16 12:08:54] Loading model from /usr/src/app/models/ar-en/opus.bpe32k-bpe32k.transformer.model1.npz.best-perplexity.npz opus-mt_1 | [2020-10-16 12:08:58] Server is listening on port 10001 opus-mt_1 | [2020-10-16 12:09:04] [memory] Extending reserved space to 512 MB (device cpu1) opus-mt_1 | [2020-10-16 12:09:04] Loading scorer of type transformer as feature F0 opus-mt_1 | [2020-10-16 12:09:04] Loading model from /usr/src/app/models/ar-en/opus.bpe32k-bpe32k.transformer.model1.npz.best-perplexity.npz opus-mt_1 | [2020-10-16 12:09:08] [memory] Extending reserved space to 512 MB (device cpu2) opus-mt_1 | [2020-10-16 12:09:08] Loading scorer of type transformer as feature F0 opus-mt_1 | [2020-10-16 12:09:08] Loading model from /usr/src/app/models/ar-en/opus.bpe32k-bpe32k.transformer.model1.npz.best-perplexity.npz opus-mt_1 | [2020-10-16 12:09:11] [memory] Extending reserved space to 512 MB (device cpu3) opus-mt_1 | [2020-10-16 12:09:12] Loading scorer of type transformer as feature F0 opus-mt_1 | [2020-10-16 12:09:12] Loading model from /usr/src/app/models/ar-en/opus.bpe32k-bpe32k.transformer.model1.npz.best-perplexity.npz opus-mt_1 | [2020-10-16 12:09:18] [memory] Extending reserved space to 512 MB (device cpu4) opus-mt_1 | [2020-10-16 12:09:19] Loading scorer of type transformer as feature F0 opus-mt_1 | [2020-10-16 12:09:19] Loading model from /usr/src/app/models/ar-en/opus.bpe32k-bpe32k.transformer.model1.npz.best-perplexity.npz opus-mt_1 | [2020-10-16 12:09:22] [memory] Extending reserved space to 512 MB (device cpu5) opus-mt_1 | [2020-10-16 12:09:22] Loading scorer of type transformer as feature F0 opus-mt_1 | [2020-10-16 12:09:22] Loading model from /usr/src/app/models/ar-en/opus.bpe32k-bpe32k.transformer.model1.npz.best-perplexity.npz opus-mt_1 | [2020-10-16 12:09:25] [memory] Extending reserved space to 512 MB (device cpu6) opus-mt_1 | [2020-10-16 12:09:26] Loading scorer of type transformer as feature F0 opus-mt_1 | [2020-10-16 12:09:26] Loading model from /usr/src/app/models/ar-en/opus.bpe32k-bpe32k.transformer.model1.npz.best-perplexity.npz opus-mt_1 | [2020-10-16 12:09:29] [memory] Extending reserved space to 512 MB (device cpu7) opus-mt_1 | [2020-10-16 12:09:29] Loading scorer of type transformer as feature F0 opus-mt_1 | [2020-10-16 12:09:29] Loading model from /usr/src/app/models/ar-en/opus.bpe32k-bpe32k.transformer.model1.npz.best-perplexity.npz opus-mt_1 | [2020-10-16 12:09:32] Server is listening on port 10002

But then in execution time I see it only uses 1 CPU and takes the same time as without the cpu-threads:8 config. It also just prints:

opus-mt_1 | [2020-10-16 12:10:30] [memory] Reserving 295 MB, device cpu0

Does any one know how to use multiple CPUs in decoding ?

Thanks.

@smiranda smiranda changed the title Use multiple CPU cores Use multiple CPU cores during decoding Oct 16, 2020
@jorgtied
Copy link
Member

Is this with the tornado web app or the other OPUS-MT server? It may be related to translation of batches that is not well supported so far.

@smiranda
Copy link
Author

smiranda commented Dec 4, 2020

@jorgtied Hello, I'm using the provided Dockerfile which launches CMD python3 server.py -c services.json -p 80. I think this means I'm using the tornado server. Which one is the "other" OPUS-MT server ? Is there a way to change the Dockerfile to use that one instead of the default one ? Thank you.

@jorgtied
Copy link
Member

Information about the other server option is here: https://github.com/Helsinki-NLP/Opus-MT/blob/master/doc/WebSocketServer.md

@smiranda
Copy link
Author

Thank you for your help.

cd Opus-MT/install
make all

This ran without errors, it seems. But the next command sudo make install said install: cannot stat 'marian/build/marian-server': No such file or directory, and in fact that file is not there, although marian, marian-decoder, etc, are there. My machine is a Ubuntu 16.04.7 LTS.

Do you know why this might happen ?

Is there a Dockerfile for this server version I could use ?

@jorgtied
Copy link
Member

This is strange. Could it be that marian does notr compile a server binary anymore in the latest versions? I need to check that. Cold you try to revert to an earlier version of MarianNMT when compiling the system?

@smiranda
Copy link
Author

smiranda commented Jan 3, 2021

@jorgtied thanks for your help so far ! Just FYI, I might return to this issue but I don't have more time at the moment. Tweaking around marian compilation seems too dificult for me. Maybe later !

If it suits you best, you can close the issue.

@jorgtied
Copy link
Member

@smiranda I made a change in the installation makefile. Does it work now and does it compile the marian-server binary?

@smiranda
Copy link
Author

smiranda commented Jan 19, 2021

@jorgtied Hello I was now able to install and run this service ! Thank you.

I still have the multi-core issue: It only uses 1 core even when processing a large text (a news item, several sentences). Is there somewhere I must configure multi-core ? I tried on the init.d file for the server in the marian command line option --cpu-threads 4. Please let me know if you have been able before to see multicore cpu activity.

I also have another observation, we're only supposed to pass 1 sentence here ? It seems so since the output is much smaller than input for a large text. In the other server, the docker http one, you can pass a large text and it does sentence splitting inside. Is this one supposed to be used differently ?

@GermainZ
Copy link

I see similar behavior here. The --cpu-threads option seemingly has no effect on CPU usage or translation times (I tried using 1 thread and up to 16).

A workaround is to run multiple instances of marian-server and route requests between them, which is what I ended up doing for now, but that requires a lot more work.
So for example, instead of using marian-server … --cpu-threads 16, I am running 16 instances of marian-server … --cpu-threads 1 and sending requests to these 16 instances (without caring about proper balancing for now).
This indeed results in higher CPU usage across cores, and better translation times.

Is this normal or am I missing something here? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants