Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MBConvBlockWithoutDepthwise stride implemented in 1x1 projection, wasting expansion arithmetic #660

Closed
andravin opened this issue Jan 10, 2020 · 5 comments

Comments

@andravin
Copy link

andravin commented Jan 10, 2020

MBConvBlockWithoutDepthwise implements stride in the 1x1 projection convolution. When stride=2, the projection discards 3/4ths of the activations produced by the expansion. It would be equivalent to implement stride on the 3x3 expansion convolution instead, and this would reduce the total block arithmetic almost by a factor of 4.

self._expand_conv = tf.layers.Conv2D(
filters,
kernel_size=[3, 3],
strides=[1, 1],
kernel_initializer=conv_kernel_initializer,
padding='same',
use_bias=False)
self._bn0 = self._batch_norm(
axis=self._channel_axis,
momentum=self._batch_norm_momentum,
epsilon=self._batch_norm_epsilon)
# Output phase:
filters = self._block_args.output_filters
self._project_conv = tf.layers.Conv2D(
filters,
kernel_size=[1, 1],
strides=self._block_args.strides,
kernel_initializer=conv_kernel_initializer,
padding='same',
use_bias=False)

@andravin
Copy link
Author

The reduction in OPs is substantial: moving the stride from the projection convolution to the expansion reduces the total number of operations in EfficientEdgeTPU-S by about 24%.

@andravin
Copy link
Author

andravin commented Dec 27, 2020

At this point, it should be apparent that Google has no intention of responding to this issue. I will nevertheless leave this comment here in case it helps anyone who should stumble upon it:

Google actually fixed this design flaw in the updated implementation of the MBConWithoutDepthwise block as it is used in MobileDets. In that work, they renamed the block "Fused convolution layer."

The implementation can be found here: https://github.com/tensorflow/models/blob/2986bcafb9eaa8fed4d78f17a04c4c5afc8f6691/research/object_detection/models/ssd_mobiledet_feature_extractor.py#L142-L147

Notice that the stride is now implemented in the expansion convolution:

    h = _conv(h, expanded_filters, kernel_size, strides=strides,
              activation_fn=activation_fn)
    if use_se:
      hidden_dim = _scale_filters(expanded_filters, 0.25)
      h = _squeeze_and_excite(h, hidden_dim, activation_fn=activation_fn)
    h = _conv(h, filters, 1, activation_fn=tf.identity)

@mingxingtan
Copy link
Contributor

Hi @andravin Thanks for the great point. I have prepared a fix change internally, which should go out in the next release.

allenwang28 pushed a commit that referenced this issue Jan 5, 2021
PiperOrigin-RevId: 349201465
@allenwang28
Copy link
Contributor

Fixed in 32572cb

@yuezha01
Copy link

yuezha01 commented May 4, 2021

I noticed exactly the same issue recently when I looked into the Efficientnet_EdgeTPU repo. Thank @andravin for pointing out the issue here.
Quick note here in case anyone uses the TFlite files (e.g. EfficientNet-EdgeTPU-S) published in the efficientnet_edgeTPU repo. Although the issue has been fixed in the new released code, the Tflite files still has stride of 2 in the point wise layer of fused block. As @andravin mentioned above, the operation counts measured for this published TFlite is unnecessarily higher (~30%).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants