Upscaling stable diffusion latents using a small neural network.
Very similar to my latent interposer, this small model can be used to upscale latents in a way that doesn't ruin the image. I mostly explain some of the issues with upscaling latents in this issue. Think of this as an ESRGAN for latents, except severely undertrained.
Currently, SDXL has some minimal hue shift issues. Because of course it does.
To install it, simply clone this repo to your custom_nodes folder using the following command: git clone https://github.com/city96/SD-Latent-Upscaler custom_nodes/SD-Latent-Upscaler
.
Alternatively, you can download the comfy_latent_upscaler.py file to your ComfyUI/custom_nodes folder as well. You may need to install hfhub using the command pip install huggingface-hub inside your venv.
If you need the model weights for something else, they are hosted on HF under the same Apache2 license as the rest of the repo.
Currently not supported but it should be possible to use it at the hires-fix part.
The node pulls the required files from huggingface hub by default. You can create a models
folder and place the modules there if you have a flaky connection or prefer to use it completely offline, it will load them locally instead. The path should be: ComfyUI/custom_nodes/SD-Latent-Upscaler/models
Alternatively, just clone the entire HF repo to it: git clone https://huggingface.co/city96/SD-Latent-Upscaler custom_nodes/SD-Latent-Upscaler/models
Usage is fairly simple. You use it anywhere where you would upscale a latent. If you need a higher scale factor (e.g. x4), simply chain two of the upscalers.
As part of a workflow - notice how the second stage works despite the low denoise of 0.2. The image remains relatively unchanged.
I decided to do some more research and change the network architecture alltogether. This one is just a bunch of Conv2d
layers with an Upsample
at the beginning, similar to before except I reduced the kernel size/padding and instead added more layers.
Trained for 1M iterations on DIV2K + Flickr2K. I changed to AdamW + L1 loss (from SGD and MSE loss) and added a OneCycleLR
scheduler.
This version was still relatively undertrained. Mostly a proof-of-concept.
Trained for 1M iterations on DIV2K + Flickr2K.