Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

the recall_model->exec->Forward cost most time, how can I reduce the cost time? #15450

Open
gxkevin opened this issue Jul 3, 2019 · 2 comments
Labels
C++ Related to C++ Performance

Comments

@gxkevin
Copy link

gxkevin commented Jul 3, 2019

I have a problem is that when I do the dnn predict, where I will use the SyncCopyFromCPU and the Forward, the batch_size and fea_num is 40,default blas is openblas(I have also tried the Intel mkl,but it doesn't work),the cpu is broadwell,58 logical core total。

I have 32-58 worker thread,each thead only have 1 openmp thread,I worry that open too many openmp thread will decrease the performance。

After the test, I found that , the predict totally cost 13.9ms, SyncCopyFromCPU cost 275us, but the Forward cost 11ms, have can i reduce the forward cost time ?

    dnn_model->model_data["data"].SyncCopyFromCPU(batch_data.data(), batch_size * fea_num);
    mxnet::cpp::NDArray::WaitAll();
    dnn_model->exec->Forward(false);
    mxnet::cpp::NDArray::WaitAll();
@gxkevin
Copy link
Author

gxkevin commented Jul 3, 2019

56 logical cores

@lanking520 lanking520 added C++ Related to C++ Performance labels Jul 8, 2019
@lanking520
Copy link
Member

Hi @leleamol could you please take a look

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
C++ Related to C++ Performance
Projects
None yet
Development

No branches or pull requests

2 participants