ResNet synthetic data performance enhancement. #5225

tfboyd · 2018-09-02T15:42:55Z

All numbers are form a DGX-1 with V100s

tl;dr; I improved synthetic data performance from ~4,800 images/sec to 5,500 images/sec 14.6% speedup on ResNetV1 FP16 maybe more with smaller models.

The current Synthetic data has a couple problems. 1) the dtype is set to float32 and is then cast on the GPU (which is something that needs changed for real data a well but is less problematic and I will do a PR for next) no matter what 2) it does not seem to have prefetch. Both of these combine for a situation where real data is faster than synthetic data: Real data ~5,200 images/sec ResNet V1 and ~4,800 images/sec synthetic data.

During my testing I found:

Change current code to tf.float16 and leave the tf.cast gets 5,273 images/sec
Change current code to tf.float16 and remove tf.cast gest 5,329 images/sec. I have doubts the unneeded cast has any cost in this scenario and could have been noise.
Change the input_fn to my solution based on tf_cnn_benchmarks 5,534 images/sec

This solution still has the host to device copy, which I believe can only be removed with a custom dataset and I have some doubt it is worth it in the near-term.

For followup work is to move the tf.cast to fp16 for real data as part of the input pipeline and then removing the tf.cast in resnet_run_loop had a small but seemingly consistent improvement. It also seems more valid and keeps work off the GPU.

robieta

LGTM

robieta · 2018-09-02T21:16:12Z

official/resnet/cifar10_test.py

 dataset = input_fn(True, '', _BATCH_SIZE)
- iterator = dataset.make_one_shot_iterator()
+ iterator = dataset.make_initializable_iterator()


Why was this necessary?

Oneshot got angry about I think truncated_normal in relation to parameters or something I do not remember the error message. I will run it again just to be sure I did not fix the issue another way. It is easy to trigger the exception (or not and switch it back) so I will do it again out of our curiosity. :-) Then I will merge this and move forward.

ValueError: Failed to create a one-shot iterator for a dataset. Dataset.make_one_shot_iterator() does not support datasets that capture stateful objects, such as a Variable or LookupTable. In these cases, use Dataset.make_initializable_iterator(). (Original error: Cannot capture a stateful node (name:synthetic_inputs/TruncatedNormal, type:TruncatedNormal) by value.)

tfboyd added 2 commits September 2, 2018 08:10

Improve synthic data performance

c9972ad

tweak synth input_fn comments

967133c

tfboyd requested review from karmel and robieta September 2, 2018 15:42

tfboyd requested a review from a team as a code owner September 2, 2018 15:42

googlebot added the cla: yes label Sep 2, 2018

robieta approved these changes Sep 2, 2018

View reviewed changes

tfboyd changed the title ~~ResNet performance fix~~ ResNet synthetic data performance enhancement. Sep 4, 2018

tfboyd merged commit 481728d into tensorflow:master Sep 4, 2018

tfboyd deleted the resnet_synthetic_fix branch October 9, 2018 20:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ResNet synthetic data performance enhancement. #5225

ResNet synthetic data performance enhancement. #5225

tfboyd commented Sep 2, 2018

robieta left a comment

robieta Sep 2, 2018

tfboyd Sep 4, 2018

tfboyd Sep 4, 2018

ResNet synthetic data performance enhancement. #5225

ResNet synthetic data performance enhancement. #5225

Conversation

tfboyd commented Sep 2, 2018

robieta left a comment

Choose a reason for hiding this comment

robieta Sep 2, 2018

Choose a reason for hiding this comment

tfboyd Sep 4, 2018

Choose a reason for hiding this comment

tfboyd Sep 4, 2018

Choose a reason for hiding this comment