AnyText: Multilingual Visual Text Generation And Editing

AI Image Editors1年前 (2024)更新 Prompt engineer

30,886 0 90

Table of Contents

About AnyText

Diffusion model based Text-to-Image has achieved impressive achievements recently. Although current technology for synthesizing images is highly advanced and capable of generating images with high fidelity, it is still possible to give the show away when focusing on the text area in the generated image. To address this issue, we introduce AnyText, a diffusion-based multilingual visual text generation and editing model, that focuses on rendering accurate and coherent text in the image. AnyText comprises a diffusion pipeline with two primary elements: an auxiliary latent module and a text embedding module. The former uses inputs like text glyph, position, and masked image to generate latent features for text generation or editing. The latter employs an OCR model for encoding stroke data as embeddings, which blend with image caption embeddings from the tokenizer to generate texts that seamlessly integrate with the background. We employed text-control diffusion loss and text perceptual loss for training to further enhance writing accuracy. AnyText can write characters in multiple languages, to the best of our knowledge, this is the first work to address multilingual visual text generation. It is worth mentioning that AnyText can be plugged into existing diffusion models from the community for rendering or editing text accurately. After conducting extensive evaluation experiments, our method has outperformed all other approaches by a significant margin. Additionally, we contribute the first large-scale multilingual text images dataset, AnyWord-3M, containing 3 million image-text pairs with OCR annotations in multiple languages. Based on AnyWord-3M dataset, we propose AnyText-benchmark for the evaluation of visual text generation accuracy and quality. Our project will be open-sourced on this https URL to improve and promote the development of text generation technology.

AnyText represents a significant advancement in the field of multilingual visual text generation and editing. Developed by Yuxiang Tuo, Wangmeng Xiang, Jun-Yan He, Yifeng Geng, and Xuansong Xie, this innovative tool is designed to seamlessly integrate texts into images, enhancing both the aesthetics and utility of visual text manipulation.

Methodology

The core of AnyText lies in its unique diffusion pipeline, comprising an auxiliary latent module and a text embedding module. The latent module utilizes text glyphs, positions, and masked images to generate latent features, while the text embedding module leverages an OCR model to encode stroke data as embeddings. These embeddings are then combined with image caption embeddings from a tokenizer, resulting in texts that naturally blend with the background. This sophisticated approach is further refined through text-control diffusion loss and text perceptual loss, ensuring high accuracy in text generation and editing. Different parts are explain following.

AnyText: Multilingual Visual Text Generation And Editing — Pipeline of Anytext

Auxiliary Latent Module: This part takes text’s shape (glyph), its position, and the part of the image where text will go (masked image). It processes these to create a blueprint (latent features) for how the text should look and where it should be placed.

2. Text Embedding Module: This segment uses an Optical Character Recognition (OCR) model to understand the text’s style and combines it with the image’s context, ensuring the text fits well within the image.

3. Text-control Diffusion Pipeline: The combined information from the latent module and the text embedding module goes through a series of steps to gradually build the final image, starting from a rough outline to the finished product.

4. Text Perceptual Loss: This is a quality check step, comparing the final image with the target to ensure the text appears as it should.

The whole process is like creating a text tattoo on an image, making sure it matches the image’s style and context, and then refining it until it looks just right.

Installation Guide

The installation process of AnyText is straightforward. Users need to install git, clone the AnyText code, prepare a font file (Arial Unicode MS is recommended), and create a new environment to install the necessary packages. The process involves running simple commands in the terminal or command prompt, making it accessible even for those with basic programming knowledge.# Install git (skip if already done)
conda install -c anaconda git
# Clone anytext code
git clone https://github.com/tyxsspa/AnyText.git
cd AnyText
# Prepare a font file; Arial Unicode MS is recommended, **you need to download it on your own**
mv your/path/to/arialuni.ttf ./font/Arial_Unicode.ttf
# Create a new environment and install packages as follows:
conda env create -f environment.yaml
conda activate anytext

Inference and Usage

AnyText offers two primary modes: Text Generation and Text Editing. The tool includes an easy-to-run inference code that allows users to test both modes, ensuring the correct installation and functionality of the environment.python inference.py

For users with advanced GPU resources, there’s an option to deploy a comprehensive demo, which includes user instructions, a friendly interface, and numerous examples. It’s important to note that the model files are downloaded to a specific directory during the first inference execution, but this can be modified if necessary.python demo.py

Interface will look like following.

Future Developments and Contributions

The AnyText team has outlined several future goals, including releasing the model and inference code, providing a publicly accessible demo link, releasing tools for merging weights from community models, and releasing both the AnyText-benchmark dataset and AnyWord-3M dataset along with their respective training and evaluation codes. These upcoming releases are poised to further enhance the utility and reach of AnyText in the field of text generation and editing.

Conclusion

AnyText is a groundbreaking tool that revolutionizes the way text is integrated into images. Its advanced methodology, ease of installation and use, and the promise of future developments make it a valuable asset for professionals and enthusiasts alike in the domain of digital text manipulation. This tool not only signifies a leap in technology but also opens up new possibilities for creative and practical applications in various fields.

About anytxt

Anytxt:A Desktop Search Tool with A Powerful Full-Text Search Engine. Best Google Desktop Search Alternative.

AnyTXT Searcher is a powerful file content search application, just like a local disk Google search engine, and much faster than Windows search and Windows findstr command. Anytxt is your best desktop file content full-text search engine and the best Google Desktop Search alternative.

AnyTXT Searcher has a powerful document parsing engine, which extracts the text of commonly used documents without installing any other software and combines the built-in high-speed indexing system to store the text’s metadata. You can quickly find any words on your computer with Anytxt. It works perfectly on Windows 11, 10, 8, 7, Vista, XP, 2003, 2008, 2012, 2016, 2019, 2022, etc.

anytext is not anytxt!