Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Bert gemms true fp16 #17466

Merged
merged 10 commits into from
Apr 7, 2020
Merged

Conversation

MoisesHer
Copy link
Contributor

@MoisesHer MoisesHer commented Jan 29, 2020

Description

This PR allows true FP16 in CUBLAS gemms if environment variable MXNET_FC_TRUE_FP16 is set to true

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator):
    tests/python/gpu/test_gluon_gpu:test_gemms_true_fp16
  • Code is well-documented:
  • To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • General half_precision Gemm: allows true fp16 if MXNET_FC_TRUE_FP16 is set
  • Transformer: allows true fp16 if MXNET_FC_TRUE_FP16 is set

Comments

Copy link
Member

@eric-haibin-lin eric-haibin-lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a unit test to set this env var locally?

Copy link
Member

@eric-haibin-lin eric-haibin-lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like some tests failed:

[2020-02-29T00:24:20.761Z] ======================================================================

[2020-02-29T00:24:20.761Z] ERROR: test_gluon_gpu.test_gemms_true_fp16

[2020-02-29T00:24:20.761Z] ----------------------------------------------------------------------

[2020-02-29T00:24:20.761Z] Traceback (most recent call last):

[2020-02-29T00:24:20.761Z]   File "/usr/local/lib/python3.6/site-packages/nose/case.py", line 198, in runTest

[2020-02-29T00:24:20.761Z]     self.test(*self.arg)

[2020-02-29T00:24:20.761Z]   File "/work/mxnet/tests/python/gpu/../unittest/common.py", line 215, in test_new

[2020-02-29T00:24:20.761Z]     orig_test(*args, **kwargs)

[2020-02-29T00:24:20.761Z]   File "/work/mxnet/tests/python/gpu/test_gluon_gpu.py", line 634, in test_gemms_true_fp16

[2020-02-29T00:24:20.761Z]     assert_almost_equal(ref_results.asnumpy(), results_trueFP16.asnumpy(),

[2020-02-29T00:24:20.761Z]   File "/work/mxnet/python/mxnet/ndarray/ndarray.py", line 2561, in asnumpy

[2020-02-29T00:24:20.761Z]     ctypes.c_size_t(data.size)))

[2020-02-29T00:24:20.761Z]   File "/work/mxnet/python/mxnet/base.py", line 246, in check_call

[2020-02-29T00:24:20.761Z]     raise get_last_ffi_error()

[2020-02-29T00:24:20.761Z] mxnet.base.MXNetError: Traceback (most recent call last):

[2020-02-29T00:24:20.761Z]   [bt] (9) /work/mxnet/python/mxnet/../../build/libmxnet.so(std::thread::_Impl<std::_Bind_simple<std::function<void (std::shared_ptr<dmlc::ManualEvent>)> (std::shared_ptr<dmlc::ManualEvent>)> >::_M_run()+0x3b) [0x7f0532fc99cb]

[2020-02-29T00:24:20.761Z]   [bt] (8) /work/mxnet/python/mxnet/../../build/libmxnet.so(std::_Function_handler<void (std::shared_ptr<dmlc::ManualEvent>), mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, bool)::{lambda()#4}::operator()() const::{lambda(std::shared_ptr<dmlc::ManualEvent>)#1}>::_M_invoke(std::_Any_data const&, std::shared_ptr<dmlc::ManualEvent>)+0x3e) [0x7f0532fcbf9e]

[2020-02-29T00:24:20.761Z]   [bt] (7) /work/mxnet/python/mxnet/../../build/libmxnet.so(void mxnet::engine::ThreadedEnginePerDevice::GPUWorker<(dmlc::ConcurrentQueueType)0>(mxnet::Context, bool, mxnet::engine::ThreadedEnginePerDevice::ThreadWorkerBlock<(dmlc::ConcurrentQueueType)0>*, std::shared_ptr<dmlc::ManualEvent> const&)+0x12a) [0x7f0532fcbd3a]

[2020-02-29T00:24:20.761Z]   [bt] (6) /work/mxnet/python/mxnet/../../build/libmxnet.so(mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext, mxnet::engine::OprBlock*)+0x48b) [0x7f0532fca9cb]

[2020-02-29T00:24:20.761Z]   [bt] (5) /work/mxnet/python/mxnet/../../build/libmxnet.so(+0x24bc30f) [0x7f0532fc130f]

[2020-02-29T00:24:20.761Z]   [bt] (4) /work/mxnet/python/mxnet/../../build/libmxnet.so(mxnet::imperative::PushFCompute(std::function<void (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&)> const&, nnvm::Op const*, nnvm::NodeAttrs const&, mxnet::Context const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::Resource, std::allocator<mxnet::Resource> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<unsigned int, std::allocator<unsigned int> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&)::{lambda(mxnet::RunContext)#1}::operator()(mxnet::RunContext) const+0x4d6) [0x7f053306b1b6]

[2020-02-29T00:24:20.761Z]   [bt] (3) /work/mxnet/python/mxnet/../../build/libmxnet.so(void mxnet::op::FullyConnectedCompute<mshadow::gpu>(nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&)+0x27f) [0x7f0536d120ff]

[2020-02-29T00:24:20.761Z]   [bt] (2) /work/mxnet/python/mxnet/../../build/libmxnet.so(void mxnet::op::FCForward<mshadow::gpu, mshadow::half::half_t>(mxnet::OpContext const&, mxnet::op::FullyConnectedParam const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&)+0x3eb) [0x7f0536d156bb]

[2020-02-29T00:24:20.761Z]   [bt] (1) /work/mxnet/python/mxnet/../../build/libmxnet.so(void linalg_gemm<mshadow::gpu, mshadow::half::half_t>(mshadow::Tensor<mshadow::gpu, 2, mshadow::half::half_t> const&, mshadow::Tensor<mshadow::gpu, 2, mshadow::half::half_t> const&, mshadow::Tensor<mshadow::gpu, 2, mshadow::half::half_t> const&, mshadow::half::half_t, mshadow::half::half_t, bool, bool, mshadow::Stream<mshadow::gpu>*)+0x613) [0x7f05369c1363]

[2020-02-29T00:24:20.761Z]   [bt] (0) /work/mxnet/python/mxnet/../../build/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x4f) [0x7f0532eb152f]

[2020-02-29T00:24:20.761Z]   File "/work/mxnet/src/operator/contrib/./../linalg_impl.h", line 301

[2020-02-29T00:24:20.761Z] cuBLAS: Check failed: e == CUBLAS_STATUS_SUCCESS (8 vs. 0) : CUBLAS_STATUS_ARCH_MISMATCH

[2020-02-29T00:24:20.761Z] -------------------- >> begin captured logging << --------------------

[2020-02-29T00:24:20.761Z] common: INFO: Setting test np/mx/python random seeds, use MXNET_TEST_SEED=289472624 to reproduce.

[2020-02-29T00:24:20.761Z] --------------------- >> end captured logging << ---------------------

[2020-02-29T00:24:20.761Z] 

[2020-02-29T00:24:20.761Z] ======================================================================

[2020-02-29T00:24:20.761Z] ERROR: test_operator_gpu.test_deconvolution_options

[2020-02-29T00:24:20.761Z] ----------------------------------------------------------------------

[2020-02-29T00:24:20.761Z] Traceback (most recent call last):

[2020-02-29T00:24:20.761Z]   File "/usr/local/lib/python3.6/site-packages/nose/case.py", line 198, in runTest

[2020-02-29T00:24:20.761Z]     self.test(*self.arg)

[2020-02-29T00:24:20.761Z]   File "/work/mxnet/tests/python/gpu/../unittest/common.py", line 215, in test_new

[2020-02-29T00:24:20.761Z]     orig_test(*args, **kwargs)

[2020-02-29T00:24:20.761Z]   File "/work/mxnet/tests/python/gpu/test_operator_gpu.py", line 959, in test_deconvolution_options

[2020-02-29T00:24:20.761Z]     check_consistency_NxM([sym, sym_no_cudnn], ctx_list)

[2020-02-29T00:24:20.761Z]   File "/work/mxnet/tests/python/gpu/test_operator_gpu.py", line 642, in check_consistency_NxM

[2020-02-29T00:24:20.761Z]     check_consistency(np.repeat(sym_list, len(ctx_list)), ctx_list * len(sym_list), scale=0.5)

[2020-02-29T00:24:20.761Z]   File "/work/mxnet/python/mxnet/test_utils.py", line 1572, in check_consistency

[2020-02-29T00:24:20.761Z]     assert_almost_equal(arr, gtarr, rtol=rtol, atol=atol, equal_nan=equal_nan)

[2020-02-29T00:24:20.761Z]   File "/work/mxnet/python/mxnet/test_utils.py", line 602, in assert_almost_equal

[2020-02-29T00:24:20.761Z]     if output.asnumpy() == 1:

[2020-02-29T00:24:20.761Z]   File "/work/mxnet/python/mxnet/ndarray/ndarray.py", line 2561, in asnumpy

[2020-02-29T00:24:20.761Z]     ctypes.c_size_t(data.size)))

[2020-02-29T00:24:20.761Z]   File "/work/mxnet/python/mxnet/base.py", line 246, in check_call

[2020-02-29T00:24:20.761Z]     raise get_last_ffi_error()

[2020-02-29T00:24:20.761Z] mxnet.base.MXNetError: Traceback (most recent call last):

[2020-02-29T00:24:20.761Z]   [bt] (9) /work/mxnet/python/mxnet/../../build/libmxnet.so(std::thread::_Impl<std::_Bind_simple<std::function<void (std::shared_ptr<dmlc::ManualEvent>)> (std::shared_ptr<dmlc::ManualEvent>)> >::_M_run()+0x3b) [0x7f0532fc99cb]

[2020-02-29T00:24:20.761Z]   [bt] (8) /work/mxnet/python/mxnet/../../build/libmxnet.so(std::_Function_handler<void (std::shared_ptr<dmlc::ManualEvent>), mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, bool)::{lambda()#4}::operator()() const::{lambda(std::shared_ptr<dmlc::ManualEvent>)#1}>::_M_invoke(std::_Any_data const&, std::shared_ptr<dmlc::ManualEvent>)+0x3e) [0x7f0532fcbf9e]

[2020-02-29T00:24:20.761Z]   [bt] (7) /work/mxnet/python/mxnet/../../build/libmxnet.so(void mxnet::engine::ThreadedEnginePerDevice::GPUWorker<(dmlc::ConcurrentQueueType)0>(mxnet::Context, bool, mxnet::engine::ThreadedEnginePerDevice::ThreadWorkerBlock<(dmlc::ConcurrentQueueType)0>*, std::shared_ptr<dmlc::ManualEvent> const&)+0x12a) [0x7f0532fcbd3a]

[2020-02-29T00:24:20.761Z]   [bt] (6) /work/mxnet/python/mxnet/../../build/libmxnet.so(mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext, mxnet::engine::OprBlock*)+0x48b) [0x7f0532fca9cb]

[2020-02-29T00:24:20.761Z]   [bt] (5) /work/mxnet/python/mxnet/../../build/libmxnet.so(+0x24dffeb) [0x7f0532fe4feb]

[2020-02-29T00:24:20.761Z]   [bt] (4) /work/mxnet/python/mxnet/../../build/libmxnet.so(mxnet::exec::FComputeExecutor::Run(mxnet::RunContext, bool)+0xe5) [0x7f0532fde485]

[2020-02-29T00:24:20.761Z]   [bt] (3) /work/mxnet/python/mxnet/../../build/libmxnet.so(void mxnet::op::DeconvolutionCompute<mshadow::gpu>(nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&)+0x3d3) [0x7f0536cd5e83]

[2020-02-29T00:24:20.761Z]   [bt] (2) /work/mxnet/python/mxnet/../../build/libmxnet.so(mxnet::op::DeconvolutionOp<mshadow::gpu, mshadow::half::half_t>::Forward(mxnet::OpContext const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&)+0xd6b) [0x7f0536cf77cb]

[2020-02-29T00:24:20.761Z]   [bt] (1) /work/mxnet/python/mxnet/../../build/libmxnet.so(void linalg_gemm<mshadow::gpu, mshadow::half::half_t>(mshadow::Tensor<mshadow::gpu, 2, mshadow::half::half_t> const&, mshadow::Tensor<mshadow::gpu, 2, mshadow::half::half_t> const&, mshadow::Tensor<mshadow::gpu, 2, mshadow::half::half_t> const&, mshadow::half::half_t, mshadow::half::half_t, bool, bool, mshadow::Stream<mshadow::gpu>*)+0x613) [0x7f05369c1363]

[2020-02-29T00:24:20.761Z]   [bt] (0) /work/mxnet/python/mxnet/../../build/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x4f) [0x7f0532eb152f]

[2020-02-29T00:24:20.761Z]   File "/work/mxnet/src/operator/contrib/./../linalg_impl.h", line 301

[2020-02-29T00:24:20.761Z] cuBLAS: Check failed: e == CUBLAS_STATUS_SUCCESS (8 vs. 0) : CUBLAS_STATUS_ARCH_MISMATCH

@ptrendx
Copy link
Member

ptrendx commented Mar 5, 2020

What is the GPU arch of CI? I believe it could be Kepler still, where this would not work. We need to check for the gpu architecture before enabling fp16 compute.

@eric-haibin-lin eric-haibin-lin merged commit 5adcbf8 into apache:master Apr 7, 2020
mk-61 pushed a commit to mk-61/incubator-mxnet that referenced this pull request Apr 7, 2020
* Temporal solution for fp16 accumulation in Bert gemms

* Resolve alpha/beta type issue

* add documentation for env variable MXNET_FC_TRUE_FP16

* Improve description of env variable

* Add unitest checking environment variable

* keep pseudo-fp16 if architecture does not support Float16Compute

* Fix cpplint
MoisesHer added a commit to MoisesHer/incubator-mxnet that referenced this pull request Apr 10, 2020
* Temporal solution for fp16 accumulation in Bert gemms

* Resolve alpha/beta type issue

* add documentation for env variable MXNET_FC_TRUE_FP16

* Improve description of env variable

* Add unitest checking environment variable

* keep pseudo-fp16 if architecture does not support Float16Compute

* Fix cpplint
ptrendx pushed a commit that referenced this pull request Apr 15, 2020
* Temporal solution for fp16 accumulation in Bert gemms

* Resolve alpha/beta type issue

* add documentation for env variable MXNET_FC_TRUE_FP16

* Improve description of env variable

* Add unitest checking environment variable

* keep pseudo-fp16 if architecture does not support Float16Compute

* Fix cpplint
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants