🚀🚀🚀 🔥🔥🔥 Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering

This is the official implementation of Glyph-ByT5 and Glyph-ByT5-v2, introduced in Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering and Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering .

News

⛽⛽⛽ Contact: [email protected]

2024.06.28 We have removed the weights and code that may have used potentially unauthorized datasets in the current stage. We will update the checkpoints after the Microsoft RAI process.

🔆 Highlights

We identify two crucial requirements of text encoders for achieving accurate visual text rendering: character awareness and alignment with glyphs. To this end, we propose a customized text encoder, Glyph-ByT5, by fine-tuning the character-aware ByT5 encoder using a meticulously curated paired glyph-text dataset.
We present an effective method for integrating Glyph-ByT5 with SDXL, resulting in the creation of the Glyph-SDXL model for design image generation. This significantly enhances text rendering accuracy, improving it from less than 20% to nearly 90% on our design image benchmark. Noteworthy is Glyph-SDXL's newfound ability for text paragraph rendering, achieving high spelling accuracy for tens to hundreds of characters with automated multi-line layouts.
We deliver a powerful customized multilingual text encoder, Glyph-ByT5-v2, and a strong aesthetic graphic generation model, Glyph-SDXL-v2, that can support accurate spelling in $\sim10$ different languages

🔧 Usage

For a detailed guide on Glyph-SDXL and Glyph-SDXL-v2 inference, see this folder.

For a detailed guide on Glyph-ByT5 alignment pretraining, see this folder.

📬 Citation

If you find this code useful in your research, please consider citing:

@article{liu2024glyph,
  title={Glyph-byt5: A customized text encoder for accurate visual text rendering},
  author={Liu, Zeyu and Liang, Weicong and Liang, Zhanhao and Luo, Chong and Li, Ji and Huang, Gao and Yuan, Yuhui},
  journal={arXiv preprint arXiv:2403.09622},
  year={2024}
}

and

@article{liu2024glyphv2,
  title={Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering},
  author={Liu, Zeyu and Liang, Weicong and Zhao, Yiming and Chen, Bohan and Li, Ji and Yuan, Yuhui},
  journal={arXiv preprint arXiv:2406.10208},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
inference/assets/teaser		inference/assets/teaser
pretraining		pretraining
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀🚀🚀 🔥🔥🔥 Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering

News

🔆 Highlights

🔧 Usage

📬 Citation

About

Releases

Packages

Languages

License

ezhangle/Glyph-ByT5

Folders and files

Latest commit

History

Repository files navigation

🚀🚀🚀 🔥🔥🔥 Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering

News

🔆 Highlights

🔧 Usage

📬 Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages