Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ResNet synthetic data performance enhancement. #5225

Merged
merged 2 commits into from
Sep 4, 2018

Conversation

tfboyd
Copy link
Member

@tfboyd tfboyd commented Sep 2, 2018

All numbers are form a DGX-1 with V100s

tl;dr; I improved synthetic data performance from ~4,800 images/sec to 5,500 images/sec 14.6% speedup on ResNetV1 FP16 maybe more with smaller models.

The current Synthetic data has a couple problems. 1) the dtype is set to float32 and is then cast on the GPU (which is something that needs changed for real data a well but is less problematic and I will do a PR for next) no matter what 2) it does not seem to have prefetch. Both of these combine for a situation where real data is faster than synthetic data: Real data ~5,200 images/sec ResNet V1 and ~4,800 images/sec synthetic data.

During my testing I found:

  • Change current code to tf.float16 and leave the tf.cast gets 5,273 images/sec
  • Change current code to tf.float16 and remove tf.cast gest 5,329 images/sec. I have doubts the unneeded cast has any cost in this scenario and could have been noise.
  • Change the input_fn to my solution based on tf_cnn_benchmarks 5,534 images/sec

This solution still has the host to device copy, which I believe can only be removed with a custom dataset and I have some doubt it is worth it in the near-term.

For followup work is to move the tf.cast to fp16 for real data as part of the input pipeline and then removing the tf.cast in resnet_run_loop had a small but seemingly consistent improvement. It also seems more valid and keeps work off the GPU.

@tfboyd tfboyd requested a review from a team as a code owner September 2, 2018 15:42
Copy link
Contributor

@robieta robieta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

dataset = input_fn(True, '', _BATCH_SIZE)
iterator = dataset.make_one_shot_iterator()
iterator = dataset.make_initializable_iterator()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this necessary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oneshot got angry about I think truncated_normal in relation to parameters or something I do not remember the error message. I will run it again just to be sure I did not fix the issue another way. It is easy to trigger the exception (or not and switch it back) so I will do it again out of our curiosity. :-) Then I will merge this and move forward.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ValueError: Failed to create a one-shot iterator for a dataset. Dataset.make_one_shot_iterator() does not support datasets that capture stateful objects, such as a Variable or LookupTable. In these cases, use Dataset.make_initializable_iterator(). (Original error: Cannot capture a stateful node (name:synthetic_inputs/TruncatedNormal, type:TruncatedNormal) by value.)

@tfboyd tfboyd changed the title ResNet performance fix ResNet synthetic data performance enhancement. Sep 4, 2018
@tfboyd tfboyd merged commit 481728d into tensorflow:master Sep 4, 2018
@tfboyd tfboyd deleted the resnet_synthetic_fix branch October 9, 2018 20:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants