MBConvBlockWithoutDepthwise stride implemented in 1x1 projection, wasting expansion arithmetic #660

andravin · 2020-01-10T18:49:09Z

MBConvBlockWithoutDepthwise implements stride in the 1x1 projection convolution. When stride=2, the projection discards 3/4ths of the activations produced by the expansion. It would be equivalent to implement stride on the 3x3 expansion convolution instead, and this would reduce the total block arithmetic almost by a factor of 4.

tpu/models/official/efficientnet/efficientnet_model.py

Lines 422 to 442 in 8462d08

 self._expand_conv = tf.layers.Conv2D( 

 filters, 

 kernel_size=[3, 3], 

 strides=[1, 1], 

 kernel_initializer=conv_kernel_initializer, 

 padding='same', 

 use_bias=False) 

 self._bn0 = self._batch_norm( 

 axis=self._channel_axis, 

 momentum=self._batch_norm_momentum, 

 epsilon=self._batch_norm_epsilon) 

 # Output phase: 

 filters = self._block_args.output_filters 

 self._project_conv = tf.layers.Conv2D( 

 filters, 

 kernel_size=[1, 1], 

 strides=self._block_args.strides, 

 kernel_initializer=conv_kernel_initializer, 

 padding='same', 

 use_bias=False)

The text was updated successfully, but these errors were encountered:

andravin · 2020-01-10T20:11:15Z

The reduction in OPs is substantial: moving the stride from the projection convolution to the expansion reduces the total number of operations in EfficientEdgeTPU-S by about 24%.

andravin · 2020-12-27T09:22:59Z

At this point, it should be apparent that Google has no intention of responding to this issue. I will nevertheless leave this comment here in case it helps anyone who should stumble upon it:

Google actually fixed this design flaw in the updated implementation of the MBConWithoutDepthwise block as it is used in MobileDets. In that work, they renamed the block "Fused convolution layer."

The implementation can be found here: https://github.com/tensorflow/models/blob/2986bcafb9eaa8fed4d78f17a04c4c5afc8f6691/research/object_detection/models/ssd_mobiledet_feature_extractor.py#L142-L147

Notice that the stride is now implemented in the expansion convolution:

    h = _conv(h, expanded_filters, kernel_size, strides=strides,
              activation_fn=activation_fn)
    if use_se:
      hidden_dim = _scale_filters(expanded_filters, 0.25)
      h = _squeeze_and_excite(h, hidden_dim, activation_fn=activation_fn)
    h = _conv(h, filters, 1, activation_fn=tf.identity)

mingxingtan · 2020-12-27T18:07:47Z

Hi @andravin Thanks for the great point. I have prepared a fix change internally, which should go out in the next release.

PiperOrigin-RevId: 349201465

allenwang28 · 2021-01-05T17:16:14Z

Fixed in 32572cb

PiperOrigin-RevId: 349201465

yuezha01 · 2021-05-04T17:56:10Z

I noticed exactly the same issue recently when I looked into the Efficientnet_EdgeTPU repo. Thank @andravin for pointing out the issue here.
Quick note here in case anyone uses the TFlite files (e.g. EfficientNet-EdgeTPU-S) published in the efficientnet_edgeTPU repo. Although the issue has been fixed in the new released code, the Tflite files still has stride of 2 in the point wise layer of fused block. As @andravin mentioned above, the operation counts measured for this published TFlite is unnecessarily higher (~30%).

allenwang28 pushed a commit that referenced this issue Jan 5, 2021

Fix issue #660

32572cb

PiperOrigin-RevId: 349201465

allenwang28 closed this as completed Jan 5, 2021

andravin mentioned this issue Feb 7, 2021

Move stride from projection layer to expansion layer for EdgeResidual block huggingface/pytorch-image-models#414

Closed

thanhhvnqb pushed a commit to thanhhvnqb/tpu that referenced this issue Apr 7, 2021

Fix issue tensorflow#660

25c9dad

PiperOrigin-RevId: 349201465

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MBConvBlockWithoutDepthwise stride implemented in 1x1 projection, wasting expansion arithmetic #660

MBConvBlockWithoutDepthwise stride implemented in 1x1 projection, wasting expansion arithmetic #660

andravin commented Jan 10, 2020 •

edited

Loading

andravin commented Jan 10, 2020

andravin commented Dec 27, 2020 •

edited

Loading

mingxingtan commented Dec 27, 2020

allenwang28 commented Jan 5, 2021

yuezha01 commented May 4, 2021 •

edited

Loading

MBConvBlockWithoutDepthwise stride implemented in 1x1 projection, wasting expansion arithmetic #660

MBConvBlockWithoutDepthwise stride implemented in 1x1 projection, wasting expansion arithmetic #660

Comments

andravin commented Jan 10, 2020 • edited Loading

andravin commented Jan 10, 2020

andravin commented Dec 27, 2020 • edited Loading

mingxingtan commented Dec 27, 2020

allenwang28 commented Jan 5, 2021

yuezha01 commented May 4, 2021 • edited Loading

andravin commented Jan 10, 2020 •

edited

Loading

andravin commented Dec 27, 2020 •

edited

Loading

yuezha01 commented May 4, 2021 •

edited

Loading