Skip to content

aime-labs/stable_diffusion_xl

 
 

Repository files navigation

Generative Models by Stability AI

sample1

News

July 26, 2023

sample2

July 4, 2023

  • A technical report on SDXL is now available here.

Installation with AIME MLC

Easy installation within an AIME ML-Container.

Clone the repo

git clone https://github.com/aime-labs/stable_diffusion_xl

Setting up AIME MLC

mlc-create sdxl Pytorch 2.0.1-aime

Install packages in AIME MLC

mlc-open sdxl
sudo apt-get install libglib2.0-0 libgl1
cd stable_diffusion_xl
pip3 install -r requirements.txt

Setup weigths

sudo apt-get install git-lfs
git lfs install
mkdir checkpoints
cd checkpoints
git clone https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0
git clone https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0

Run inference demo in AIME MLC

mlc-open sdxl
cd stable_diffusion_xl
streamlit run sampling.py

In case the streamlit command not found add it to the PATH with:

export PATH="$HOME/.local/bin:$PATH"

Run txt2img inference as HTTP/HTTPS API with AIME API Server

To run Stable Diffusion XL as HTTP/HTTPS API with AIME API Server start the chat command with following command line:

mlc-open sdxl
cd stable_diffusion_xl
python3 run_txt2img.py --api_server <url to API server> [--use_fp16] [--compile]

It will start Stable Diffusion XL as worker, waiting for job request through the AIME API Server.

Packaging

This repository uses PEP 517 compliant packaging using Hatch.

To build a distributable wheel, install hatch and run hatch build (specifying -t wheel will skip building a sdist, which is not necessary).

pip install hatch
hatch build -t wheel

You will find the built package in dist/. You can install the wheel with pip install dist/*.whl.

Note that the package does not currently specify dependencies; you will need to install the required packages, depending on your use case and PyTorch version, manually.

Inference

We provide a streamlit demo for text-to-image and image-to-image sampling in scripts/demo/sampling.py. We provide file hashes for the complete file as well as for only the saved tensors in the file (see Model Spec for a script to evaluate that). The following models are currently supported:

  • SDXL-base-1.0
    File Hash (sha256): 31e35c80fc4829d14f90153f4c74cd59c90b779f6afe05a74cd6120b893f7e5b
    Tensordata Hash (sha256): 0xd7a9105a900fd52748f20725fe52fe52b507fd36bee4fc107b1550a26e6ee1d7
    
  • SDXL-refiner-1.0
    File Hash (sha256): 7440042bbdc8a24813002c09b6b69b64dc90fded4472613437b7f55f9b7d9c5f
    Tensordata Hash (sha256): 0x1a77d21bebc4b4de78c474a90cb74dc0d2217caf4061971dbfa75ad406b75d81
    
  • SDXL-base-0.9
  • SDXL-refiner-0.9
  • SD-2.1-512
  • SD-2.1-768

Weights for SDXL:

SDXL-1.0: The weights of SDXL-1.0 are available (subject to a CreativeML Open RAIL++-M license) here:

SDXL-0.9: The weights of SDXL-0.9 are available and subject to a research license. If you would like to access these models for your research, please apply using one of the following links: SDXL-base-0.9 model, and SDXL-refiner-0.9. This means that you can apply for any of the two links - and if you are granted - you can access both. Please log in to your Hugging Face Account with your organization email to request access.

After obtaining the weights, place them into checkpoints/. Next, start the demo using

streamlit run scripts/demo/sampling.py --server.port <your_port>

Invisible Watermark Detection

Images generated with our code use the invisible-watermark library to embed an invisible watermark into the model output. We also provide a script to easily detect that watermark. Please note that this watermark is not the same as in previous Stable Diffusion 1.x/2.x versions.

To run the script you need to either have a working installation as above or try an experimental import using only a minimal amount of packages:

python -m venv .detect
source .detect/bin/activate

pip install "numpy>=1.17" "PyWavelets>=1.1.1" "opencv-python>=4.1.0.25"
pip install --no-deps invisible-watermark

To run the script you need to have a working installation as above. The script is then useable in the following ways (don't forget to activate your virtual environment beforehand, e.g. source .pt1/bin/activate):

# test a single file
python scripts/demo/detect.py <your filename here>
# test multiple files at once
python scripts/demo/detect.py <filename 1> <filename 2> ... <filename n>
# test all files in a specific folder
python scripts/demo/detect.py <your folder name here>/*

Training:

We are providing example training configs in configs/example_training. To launch a training, run

python main.py --base configs/<config1.yaml> configs/<config2.yaml>

where configs are merged from left to right (later configs overwrite the same values). This can be used to combine model, training and data configs. However, all of them can also be defined in a single config. For example, to run a class-conditional pixel-based diffusion model training on MNIST, run

python main.py --base configs/example_training/toy/mnist_cond.yaml

NOTE 1: Using the non-toy-dataset configs configs/example_training/imagenet-f8_cond.yaml, configs/example_training/txt2img-clipl.yaml and configs/example_training/txt2img-clipl-legacy-ucg-training.yaml for training will require edits depending on the used dataset (which is expected to stored in tar-file in the webdataset-format). To find the parts which have to be adapted, search for comments containing USER: in the respective config.

NOTE 2: This repository supports both pytorch1.13 and pytorch2for training generative models. However for autoencoder training as e.g. in configs/example_training/autoencoder/kl-f4/imagenet-attnfree-logvar.yaml, only pytorch1.13 is supported.

NOTE 3: Training latent generative models (as e.g. in configs/example_training/imagenet-f8_cond.yaml) requires retrieving the checkpoint from Hugging Face and replacing the CKPT_PATH placeholder in this line. The same is to be done for the provided text-to-image configs.

Building New Diffusion Models

Conditioner

The GeneralConditioner is configured through the conditioner_config. Its only attribute is emb_models, a list of different embedders (all inherited from AbstractEmbModel) that are used to condition the generative model. All embedders should define whether or not they are trainable (is_trainable, default False), a classifier-free guidance dropout rate is used (ucg_rate, default 0), and an input key (input_key), for example, txt for text-conditioning or cls for class-conditioning. When computing conditionings, the embedder will get batch[input_key] as input. We currently support two to four dimensional conditionings and conditionings of different embedders are concatenated appropriately. Note that the order of the embedders in the conditioner_config is important.

Network

The neural network is set through the network_config. This used to be called unet_config, which is not general enough as we plan to experiment with transformer-based diffusion backbones.

Loss

The loss is configured through loss_config. For standard diffusion model training, you will have to set sigma_sampler_config.

Sampler config

As discussed above, the sampler is independent of the model. In the sampler_config, we set the type of numerical solver, number of steps, type of discretization, as well as, for example, guidance wrappers for classifier-free guidance.

Dataset Handling

For large scale training we recommend using the data pipelines from our data pipelines project. The project is contained in the requirement and automatically included when following the steps from the Installation section. Small map-style datasets should be defined here in the repository (e.g., MNIST, CIFAR-10, ...), and return a dict of data keys/values, e.g.,

example = {"jpg": x,  # this is a tensor -1...1 chw
           "txt": "a beautiful image"}

where we expect images in -1...1, channel-first format.

About

Stable Diffusion XL AIME API Server worker implementation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 97.1%
  • Roff 2.9%