-
Notifications
You must be signed in to change notification settings - Fork 196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimal batch size for AVX2 and/or OpenCL? #122
Comments
Regarding AVX/AVX2/AVX512, if your hardware has it, you should enable. This API is smart enough to use and to not to use it. The decision is based on
If you are renting environment to run models, you should rent only hardware with AVX. Regarding batch size: this is an ultra interesting question. In this API, the batch size doesn't affect AVX efficiency as each sample is processed separately (in contrary to plenty other frameworks). The same is valid for OpenCL. For small NN models (non convolutional), I would recommend a batch size with at least 32 or 64 samples. For non convolutional models with 1024 inputs, I would recommend a batch size around 256 to 512 or even higher. For convolutional NNs, I would also get batch sizes around 32 or 64. If you have a lot of cores, you should consider using at least 4 samples in the batch per thread. Example: if you have 64 threads in a 64 CPU cores machine, I would consider 256 as batch size although the large batch size will slow down convergence along the first epochs. In short: starting with 32 or 64 as batch size should work well for most problems. If your model is overfitting or if you have plenty of cores, try a larger batch size with maybe a smaller learning rate. Larger batch sizes will make the threading overhead smaller for all environments: plain CPU, AVX and OpenCL on both windows and linux. |
I have enabled AVX2 and OpenCL. Is there a recommended optimal batch size when using AVX2 and/or OpenCL?
The text was updated successfully, but these errors were encountered: