Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AutoEncoder using the EfficientNet #257

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

xingyaoww
Copy link

The AutoEncoder is implemented by reverse the forward EfficientNet as a decoder, current implementation only uses Dynamic Padding for TransposedConv2d which works fine for me now.

@lukemelas
Copy link
Owner

Thanks for this PR! Very interesting. I'll have to think about whether this should be integrated into the main repo or whether it should be a standalone repo. Either way, we'll make sure the community can benefit from this good work!

I might be a bit slow to respond over the next week or two due to the holidays, so do not fret if that is the case.

…image size issue;

add latent feature by down/upsampling between encoder and decoder;
@xingyaoww
Copy link
Author

Thank you for your reply!

I just updated my implementation for AE with TransposedConv2dStaticSamePadding, since the original version didn't take odd image size into consideration: For example, when image size is changed from (29,29) to (15,15) by Conv2d, its reverse TransposedConv2d operation should convert image size (15,15) into (29,29) instead of (30,30).

The old implementation using TransposedConv2dDynamicSamePadding will convert image size into (30,30) and causing output shape issue. DynamicSamePadding only seems to work for efficientnet models with even image size (works for efficientnet-b0, but not efficientnet-b5), therefore, I am also removing TransposedConv2dDynamicSamePadding in recent commits.

@AFAgarap
Copy link

Hello. Will this be merged?

@leejonggun
Copy link

leejonggun commented Dec 30, 2021

Great Pull Request! I am trying EfficientNetAutoEncoder.from_pretrained(), and wondering below shape is correct or not.
That's why, I have just learned autoencoder is unsupervised learning type so that input shape and output shape is the same.
The autoencoder output for efficientnet-b0~7 is different as below. Could you tell me this is fine or bug?
0: input/(512,512) -> ae_output/(512,512)
1: input/(512,512) -> ae_output/(496,496)
2: input/(512,512) -> ae_output/(484,484)
3: input/(512,512) -> ae_output/(492,492)
4: input/(512,512) -> ae_output/(508,508)
5: input/(512,512) -> ae_output/(488,488)
6: input/(512,512) -> ae_output/(496,496)
7: input/(512,512) -> ae_output/(504,504)
(I'm looking into the code, but it's difficult ;) Thanks in advance if you help me)

@cwerner
Copy link

cwerner commented Jun 4, 2022

Also looking forward to this PR being merged 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants