update multistep_optimizer for tensorflow gpu #1773

AgoloCuongHoang · 2019-12-17T17:48:01Z

I found by using the original class MultistepAdamOptimizer, I got a problem of invalidArgumentError: Cannot assign a device for operation (see details below). This problem is only for tensorflow-gpu. On tensorflow it works fine. I did a lot of investigation on this (e.g. try adding normal soft placement tf.Session(config=tf.ConfigProto(allow_soft_placement=True, log_device_placement=True)) - the trick people recommended without success).

After investigation I found two actions needed to fix this issue. 1. Convert int to float, and 2. The class inherits directly from optimizer.Optimizer, not AdamOptimizer. With this I found it solved the issue. Note that without any of these twos it will not work.

Let me know if you have any question regarding to this pull request. Meanwhile let me know if you need the code to replicate the issue of invalidArgumentError. Feel free if you think you could find a better solution.

Finally, I tagged @fstahlberg as well as he wrote the original MultistepAdamOptimizer class so that he will aware of this issue.

Thank you!

A subpiece of the error:


Traceback (most recent call last):
  File "/home/cuong.hoang/anaconda2/envs/py36_env_tensor_gpu_pip_local/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
    return fn(*args)
  File "/home/cuong.hoang/anaconda2/envs/py36_env_tensor_gpu_pip_local/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1348, in _run_fn
    self._extend_graph()
  File "/home/cuong.hoang/anaconda2/envs/py36_env_tensor_gpu_pip_local/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1388, in _extend_graph
    tf_session.ExtendSession(self._session)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation training/beta1_power/IsInitialized/VarIsInitializedOp: Could not satisfy explicit device specification '' because the node {{colocation_node training/beta1_power/IsInitialized/VarIsInitializedOp}} was colocated with a group of nodes that required incompatible device '/job:localhost/replica:0/task:0/device:GPU:0'. All available devices [/job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0, /job:localhost/replica:0/task:0/device:XLA_GPU:0, /job:localhost/replica:0/task:0/device:GPU:0].

googlebot · 2019-12-17T17:48:07Z

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.

What to do if you already signed the CLA

Individual signers

It's possible we don't have your GitHub username or you're using a different email address on your commit. Check your existing CLA data and verify that your email is set on your git commits.

Corporate signers

Your company has a Point of Contact who decides which employees are authorized to participate. Ask your POC to be added to the group of authorized contributors. If you don't know who your Point of Contact is, direct the Google project maintainer to go/cla#troubleshoot (Public version).
The email used to register you as an authorized contributor must be the email used for the Git commit. Check your existing CLA data and verify that your email is set on your git commits.
The email used to register you as an authorized contributor must also be attached to your GitHub account.

ℹ️ Googlers: Go here for more info.

AgoloCuongHoang · 2019-12-17T17:56:05Z

@googlebot I signed it!

googlebot · 2019-12-17T17:56:08Z

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

afrozenator · 2019-12-19T20:43:37Z

Hi @AgoloCuongHoang -- Thanks for the nice investigation, but Travis seems to report that the tensor2tensor/utils/multistep_optimizer_test.py test failed, -- the logs are here, can you investigate?

https://travis-ci.org/tensorflow/tensor2tensor/jobs/626315428?utm_medium=notification&utm_source=github_status

afrozenator · 2019-12-20T05:23:10Z

To make Travis trigger - I will close and reopen the pull request.

@lukaszkaiser - Do you have thoughts on this? I think changing MultistepAdamOptimizer to not inherit from AdamOptimizer adds a lot of code, whatever the root cause is, in theory it shouldn't be because of the inheritance itself I feel. But obviously after much work the root cause wasn't found, but a fix made.

afrozenator · 2019-12-20T05:25:20Z

It seems like Travis did run and fail - https://travis-ci.org/tensorflow/tensor2tensor/jobs/627551248?utm_medium=notification&utm_source=github_status

So there is no need to do the close and reopen dance.

AgoloCuongHoang · 2019-12-27T16:56:20Z

@afrozenator: Sorry for my late response. I have some important work to do recently.

I just fixed the file and I think it passed the test (multistep_optimizer_test.py). It however does not pass some certain task that I indeed have no ideas why it is (I never touched on them I believe).
What could I do next?

AgoloCuongHoang · 2019-12-27T16:58:05Z

Please see my latest commit which fixes the issue and passes the multistep_optimizer_test.py

AgoloCuongHoang · 2020-04-15T16:29:37Z

@afrozenator: Any update on this? To be clear I am OK if the pull request reject was rejected. I am just curious what was going on. Thx.

lukaszkaiser · 2020-04-16T17:40:50Z

@AgoloCuongHoang : could you make this new optimizer in a separate file and in a separate class? Just so we have the old one too for compatibility with old code?

AgoloCuongHoang · 2020-04-16T17:54:04Z

@lukaszkaiser: Done. Let me know if you need me to do any thing further

afrozenator · 2020-06-16T20:26:27Z

Thanks a lot @AgoloCuongHoang merging this now!

PiperOrigin-RevId: 316746422

update multistep

29aff51

googlebot added the cla: no PR author has not signed CLA label Dec 17, 2019

googlebot added cla: yes PR author has signed CLA and removed cla: no PR author has not signed CLA labels Dec 17, 2019

AgoloCuongHoang added 2 commits December 20, 2019 10:32

add backend

134b75d

add dtypes

55248ba

AgoloCuongHoang added 3 commits December 20, 2019 15:07

correct indent

a1e0d38

add cond in functions

a4a1323

fix a type of function name

015344a

AgoloCuongHoang added 2 commits April 17, 2020 00:50

rename files

1001966

upload original files

9890379

afrozenator merged commit 94a3c0e into tensorflow:master Jun 16, 2020

tensorflow-copybara pushed a commit that referenced this pull request Jun 16, 2020

Merge of PR #1773

ceba665

PiperOrigin-RevId: 316746422

jchwenger mentioned this pull request Oct 29, 2020

MultiStep Optimizer #1863

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update multistep_optimizer for tensorflow gpu #1773

update multistep_optimizer for tensorflow gpu #1773

AgoloCuongHoang commented Dec 17, 2019 •

edited

Loading

googlebot commented Dec 17, 2019

AgoloCuongHoang commented Dec 17, 2019

googlebot commented Dec 17, 2019

afrozenator commented Dec 19, 2019

afrozenator commented Dec 20, 2019

afrozenator commented Dec 20, 2019

AgoloCuongHoang commented Dec 27, 2019

AgoloCuongHoang commented Dec 27, 2019

AgoloCuongHoang commented Apr 15, 2020

lukaszkaiser commented Apr 16, 2020

AgoloCuongHoang commented Apr 16, 2020

afrozenator commented Jun 16, 2020

update multistep_optimizer for tensorflow gpu #1773

update multistep_optimizer for tensorflow gpu #1773

Conversation

AgoloCuongHoang commented Dec 17, 2019 • edited Loading

googlebot commented Dec 17, 2019

What to do if you already signed the CLA

Individual signers

Corporate signers

AgoloCuongHoang commented Dec 17, 2019

googlebot commented Dec 17, 2019

afrozenator commented Dec 19, 2019

afrozenator commented Dec 20, 2019

afrozenator commented Dec 20, 2019

AgoloCuongHoang commented Dec 27, 2019

AgoloCuongHoang commented Dec 27, 2019

AgoloCuongHoang commented Apr 15, 2020

lukaszkaiser commented Apr 16, 2020

AgoloCuongHoang commented Apr 16, 2020

afrozenator commented Jun 16, 2020

AgoloCuongHoang commented Dec 17, 2019 •

edited

Loading