Bert gemms true fp16 #17466

MoisesHer · 2020-01-29T02:05:48Z

Description

This PR allows true FP16 in CUBLAS gemms if environment variable MXNET_FC_TRUE_FP16 is set to true

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator):
tests/python/gpu/test_gluon_gpu:test_gemms_true_fp16
Code is well-documented:
To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

General half_precision Gemm: allows true fp16 if MXNET_FC_TRUE_FP16 is set
Transformer: allows true fp16 if MXNET_FC_TRUE_FP16 is set

Comments

src/operator/linalg_impl.h

docs/static_site/src/pages/api/faq/env_var.md

eric-haibin-lin

can we add a unit test to set this env var locally?

eric-haibin-lin

Looks like some tests failed:

[2020-02-29T00:24:20.761Z] ======================================================================

[2020-02-29T00:24:20.761Z] ERROR: test_gluon_gpu.test_gemms_true_fp16

[2020-02-29T00:24:20.761Z] ----------------------------------------------------------------------

[2020-02-29T00:24:20.761Z] Traceback (most recent call last):

[2020-02-29T00:24:20.761Z]   File "/usr/local/lib/python3.6/site-packages/nose/case.py", line 198, in runTest

[2020-02-29T00:24:20.761Z]     self.test(*self.arg)

[2020-02-29T00:24:20.761Z]   File "/work/mxnet/tests/python/gpu/../unittest/common.py", line 215, in test_new

[2020-02-29T00:24:20.761Z]     orig_test(*args, **kwargs)

[2020-02-29T00:24:20.761Z]   File "/work/mxnet/tests/python/gpu/test_gluon_gpu.py", line 634, in test_gemms_true_fp16

[2020-02-29T00:24:20.761Z]     assert_almost_equal(ref_results.asnumpy(), results_trueFP16.asnumpy(),

[2020-02-29T00:24:20.761Z]   File "/work/mxnet/python/mxnet/ndarray/ndarray.py", line 2561, in asnumpy

[2020-02-29T00:24:20.761Z]     ctypes.c_size_t(data.size)))

[2020-02-29T00:24:20.761Z]   File "/work/mxnet/python/mxnet/base.py", line 246, in check_call

[2020-02-29T00:24:20.761Z]     raise get_last_ffi_error()

[2020-02-29T00:24:20.761Z] mxnet.base.MXNetError: Traceback (most recent call last):

[2020-02-29T00:24:20.761Z]   [bt] (9) /work/mxnet/python/mxnet/../../build/libmxnet.so(std::thread::_Impl<std::_Bind_simple<std::function<void (std::shared_ptr<dmlc::ManualEvent>)> (std::shared_ptr<dmlc::ManualEvent>)> >::_M_run()+0x3b) [0x7f0532fc99cb]

[2020-02-29T00:24:20.761Z]   [bt] (8) /work/mxnet/python/mxnet/../../build/libmxnet.so(std::_Function_handler<void (std::shared_ptr<dmlc::ManualEvent>), mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, bool)::{lambda()#4}::operator()() const::{lambda(std::shared_ptr<dmlc::ManualEvent>)#1}>::_M_invoke(std::_Any_data const&, std::shared_ptr<dmlc::ManualEvent>)+0x3e) [0x7f0532fcbf9e]

[2020-02-29T00:24:20.761Z]   [bt] (7) /work/mxnet/python/mxnet/../../build/libmxnet.so(void mxnet::engine::ThreadedEnginePerDevice::GPUWorker<(dmlc::ConcurrentQueueType)0>(mxnet::Context, bool, mxnet::engine::ThreadedEnginePerDevice::ThreadWorkerBlock<(dmlc::ConcurrentQueueType)0>*, std::shared_ptr<dmlc::ManualEvent> const&)+0x12a) [0x7f0532fcbd3a]

[2020-02-29T00:24:20.761Z]   [bt] (6) /work/mxnet/python/mxnet/../../build/libmxnet.so(mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext, mxnet::engine::OprBlock*)+0x48b) [0x7f0532fca9cb]

[2020-02-29T00:24:20.761Z]   [bt] (5) /work/mxnet/python/mxnet/../../build/libmxnet.so(+0x24bc30f) [0x7f0532fc130f]

[2020-02-29T00:24:20.761Z]   [bt] (4) /work/mxnet/python/mxnet/../../build/libmxnet.so(mxnet::imperative::PushFCompute(std::function<void (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&)> const&, nnvm::Op const*, nnvm::NodeAttrs const&, mxnet::Context const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::Resource, std::allocator<mxnet::Resource> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<unsigned int, std::allocator<unsigned int> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&)::{lambda(mxnet::RunContext)#1}::operator()(mxnet::RunContext) const+0x4d6) [0x7f053306b1b6]

[2020-02-29T00:24:20.761Z]   [bt] (3) /work/mxnet/python/mxnet/../../build/libmxnet.so(void mxnet::op::FullyConnectedCompute<mshadow::gpu>(nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&)+0x27f) [0x7f0536d120ff]

[2020-02-29T00:24:20.761Z]   [bt] (2) /work/mxnet/python/mxnet/../../build/libmxnet.so(void mxnet::op::FCForward<mshadow::gpu, mshadow::half::half_t>(mxnet::OpContext const&, mxnet::op::FullyConnectedParam const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&)+0x3eb) [0x7f0536d156bb]

[2020-02-29T00:24:20.761Z]   [bt] (1) /work/mxnet/python/mxnet/../../build/libmxnet.so(void linalg_gemm<mshadow::gpu, mshadow::half::half_t>(mshadow::Tensor<mshadow::gpu, 2, mshadow::half::half_t> const&, mshadow::Tensor<mshadow::gpu, 2, mshadow::half::half_t> const&, mshadow::Tensor<mshadow::gpu, 2, mshadow::half::half_t> const&, mshadow::half::half_t, mshadow::half::half_t, bool, bool, mshadow::Stream<mshadow::gpu>*)+0x613) [0x7f05369c1363]

[2020-02-29T00:24:20.761Z]   [bt] (0) /work/mxnet/python/mxnet/../../build/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x4f) [0x7f0532eb152f]

[2020-02-29T00:24:20.761Z]   File "/work/mxnet/src/operator/contrib/./../linalg_impl.h", line 301

[2020-02-29T00:24:20.761Z] cuBLAS: Check failed: e == CUBLAS_STATUS_SUCCESS (8 vs. 0) : CUBLAS_STATUS_ARCH_MISMATCH

[2020-02-29T00:24:20.761Z] -------------------- >> begin captured logging << --------------------

[2020-02-29T00:24:20.761Z] common: INFO: Setting test np/mx/python random seeds, use MXNET_TEST_SEED=289472624 to reproduce.

[2020-02-29T00:24:20.761Z] --------------------- >> end captured logging << ---------------------

[2020-02-29T00:24:20.761Z] 

[2020-02-29T00:24:20.761Z] ======================================================================

[2020-02-29T00:24:20.761Z] ERROR: test_operator_gpu.test_deconvolution_options

[2020-02-29T00:24:20.761Z] ----------------------------------------------------------------------

[2020-02-29T00:24:20.761Z] Traceback (most recent call last):

[2020-02-29T00:24:20.761Z]   File "/usr/local/lib/python3.6/site-packages/nose/case.py", line 198, in runTest

[2020-02-29T00:24:20.761Z]     self.test(*self.arg)

[2020-02-29T00:24:20.761Z]   File "/work/mxnet/tests/python/gpu/../unittest/common.py", line 215, in test_new

[2020-02-29T00:24:20.761Z]     orig_test(*args, **kwargs)

[2020-02-29T00:24:20.761Z]   File "/work/mxnet/tests/python/gpu/test_operator_gpu.py", line 959, in test_deconvolution_options

[2020-02-29T00:24:20.761Z]     check_consistency_NxM([sym, sym_no_cudnn], ctx_list)

[2020-02-29T00:24:20.761Z]   File "/work/mxnet/tests/python/gpu/test_operator_gpu.py", line 642, in check_consistency_NxM

[2020-02-29T00:24:20.761Z]     check_consistency(np.repeat(sym_list, len(ctx_list)), ctx_list * len(sym_list), scale=0.5)

[2020-02-29T00:24:20.761Z]   File "/work/mxnet/python/mxnet/test_utils.py", line 1572, in check_consistency

[2020-02-29T00:24:20.761Z]     assert_almost_equal(arr, gtarr, rtol=rtol, atol=atol, equal_nan=equal_nan)

[2020-02-29T00:24:20.761Z]   File "/work/mxnet/python/mxnet/test_utils.py", line 602, in assert_almost_equal

[2020-02-29T00:24:20.761Z]     if output.asnumpy() == 1:

[2020-02-29T00:24:20.761Z]   File "/work/mxnet/python/mxnet/ndarray/ndarray.py", line 2561, in asnumpy

[2020-02-29T00:24:20.761Z]     ctypes.c_size_t(data.size)))

[2020-02-29T00:24:20.761Z]   File "/work/mxnet/python/mxnet/base.py", line 246, in check_call

[2020-02-29T00:24:20.761Z]     raise get_last_ffi_error()

[2020-02-29T00:24:20.761Z] mxnet.base.MXNetError: Traceback (most recent call last):

[2020-02-29T00:24:20.761Z]   [bt] (9) /work/mxnet/python/mxnet/../../build/libmxnet.so(std::thread::_Impl<std::_Bind_simple<std::function<void (std::shared_ptr<dmlc::ManualEvent>)> (std::shared_ptr<dmlc::ManualEvent>)> >::_M_run()+0x3b) [0x7f0532fc99cb]

[2020-02-29T00:24:20.761Z]   [bt] (8) /work/mxnet/python/mxnet/../../build/libmxnet.so(std::_Function_handler<void (std::shared_ptr<dmlc::ManualEvent>), mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, bool)::{lambda()#4}::operator()() const::{lambda(std::shared_ptr<dmlc::ManualEvent>)#1}>::_M_invoke(std::_Any_data const&, std::shared_ptr<dmlc::ManualEvent>)+0x3e) [0x7f0532fcbf9e]

[2020-02-29T00:24:20.761Z]   [bt] (7) /work/mxnet/python/mxnet/../../build/libmxnet.so(void mxnet::engine::ThreadedEnginePerDevice::GPUWorker<(dmlc::ConcurrentQueueType)0>(mxnet::Context, bool, mxnet::engine::ThreadedEnginePerDevice::ThreadWorkerBlock<(dmlc::ConcurrentQueueType)0>*, std::shared_ptr<dmlc::ManualEvent> const&)+0x12a) [0x7f0532fcbd3a]

[2020-02-29T00:24:20.761Z]   [bt] (6) /work/mxnet/python/mxnet/../../build/libmxnet.so(mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext, mxnet::engine::OprBlock*)+0x48b) [0x7f0532fca9cb]

[2020-02-29T00:24:20.761Z]   [bt] (5) /work/mxnet/python/mxnet/../../build/libmxnet.so(+0x24dffeb) [0x7f0532fe4feb]

[2020-02-29T00:24:20.761Z]   [bt] (4) /work/mxnet/python/mxnet/../../build/libmxnet.so(mxnet::exec::FComputeExecutor::Run(mxnet::RunContext, bool)+0xe5) [0x7f0532fde485]

[2020-02-29T00:24:20.761Z]   [bt] (3) /work/mxnet/python/mxnet/../../build/libmxnet.so(void mxnet::op::DeconvolutionCompute<mshadow::gpu>(nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&)+0x3d3) [0x7f0536cd5e83]

[2020-02-29T00:24:20.761Z]   [bt] (2) /work/mxnet/python/mxnet/../../build/libmxnet.so(mxnet::op::DeconvolutionOp<mshadow::gpu, mshadow::half::half_t>::Forward(mxnet::OpContext const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&)+0xd6b) [0x7f0536cf77cb]

[2020-02-29T00:24:20.761Z]   [bt] (1) /work/mxnet/python/mxnet/../../build/libmxnet.so(void linalg_gemm<mshadow::gpu, mshadow::half::half_t>(mshadow::Tensor<mshadow::gpu, 2, mshadow::half::half_t> const&, mshadow::Tensor<mshadow::gpu, 2, mshadow::half::half_t> const&, mshadow::Tensor<mshadow::gpu, 2, mshadow::half::half_t> const&, mshadow::half::half_t, mshadow::half::half_t, bool, bool, mshadow::Stream<mshadow::gpu>*)+0x613) [0x7f05369c1363]

[2020-02-29T00:24:20.761Z]   [bt] (0) /work/mxnet/python/mxnet/../../build/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x4f) [0x7f0532eb152f]

[2020-02-29T00:24:20.761Z]   File "/work/mxnet/src/operator/contrib/./../linalg_impl.h", line 301

[2020-02-29T00:24:20.761Z] cuBLAS: Check failed: e == CUBLAS_STATUS_SUCCESS (8 vs. 0) : CUBLAS_STATUS_ARCH_MISMATCH

ptrendx · 2020-03-05T18:39:18Z

What is the GPU arch of CI? I believe it could be Kepler still, where this would not work. We need to check for the gpu architecture before enabling fp16 compute.

* Temporal solution for fp16 accumulation in Bert gemms * Resolve alpha/beta type issue * add documentation for env variable MXNET_FC_TRUE_FP16 * Improve description of env variable * Add unitest checking environment variable * keep pseudo-fp16 if architecture does not support Float16Compute * Fix cpplint

MoisesHer added 2 commits January 28, 2020 13:22

Temporal solution for fp16 accumulation in Bert gemms

5dd326d

Resolve alpha/beta type issue

b5e4cd5

eric-haibin-lin reviewed Jan 29, 2020

View reviewed changes

src/operator/linalg_impl.h Show resolved Hide resolved

MoisesHer requested review from aaronmarkham and szha as code owners January 29, 2020 20:42

add documentation for env variable MXNET_FC_TRUE_FP16

308033d

ptrendx reviewed Jan 29, 2020

View reviewed changes

docs/static_site/src/pages/api/faq/env_var.md Outdated Show resolved Hide resolved

MoisesHer added 2 commits January 29, 2020 15:18

Improve description of env variable

ac350c6

merge with master branch

16a355b

eric-haibin-lin suggested changes Feb 27, 2020

View reviewed changes

Add unitest checking environment variable

e654722

MoisesHer requested a review from eric-haibin-lin March 5, 2020 17:14

eric-haibin-lin reviewed Mar 5, 2020

View reviewed changes

MoisesHer added 2 commits March 10, 2020 18:04

keep pseudo-fp16 if architecture does not support Float16Compute

91aca7b

Fix cpplint

60fadda

MoisesHer requested a review from eric-haibin-lin March 17, 2020 18:48

MoisesHer added 2 commits March 23, 2020 13:09

Merge remote-tracking branch 'upstream/master' into bert_gemms_trueFP16

662f087

Merge remote-tracking branch 'upstream/master' into bert_gemms_trueFP16

7f14f15

eric-haibin-lin mentioned this pull request Apr 7, 2020

Flaky test on MacOS: test_dataset_take #17985

Closed

eric-haibin-lin approved these changes Apr 7, 2020

View reviewed changes

eric-haibin-lin merged commit 5adcbf8 into apache:master Apr 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bert gemms true fp16 #17466

Bert gemms true fp16 #17466

MoisesHer commented Jan 29, 2020 •

edited

Loading

eric-haibin-lin left a comment

eric-haibin-lin left a comment

ptrendx commented Mar 5, 2020

Bert gemms true fp16 #17466

Bert gemms true fp16 #17466

Conversation

MoisesHer commented Jan 29, 2020 • edited Loading

Description

Checklist

Essentials

Changes

Comments

eric-haibin-lin left a comment

Choose a reason for hiding this comment

eric-haibin-lin left a comment

Choose a reason for hiding this comment

ptrendx commented Mar 5, 2020

MoisesHer commented Jan 29, 2020 •

edited

Loading