Introducing DeepFloyd IF: From text to photorealistic images

May 2, 2023

Misc

Stability AI, in partnership with its AI research lab DeepFloyd, has released a new technology called DeepFloyd IF. This advanced text-to-image model is designed to generate high-quality images from text inputs.

The DeepFloyd IF model uses the T5-XXL-1.1 language model as a text encoder to aid in understanding text prompts. Cross-attention layers are also employed to better align the text prompt and the generated image.

One of the most impressive features of DeepFloyd IF is its ability to accurately apply text descriptions to generate images with various objects appearing in different spatial relations, something that has been challenging for other text-to-image models.

Additionally, the model generates images with a high degree of photorealism, as reflected in its impressive zero-shot FID score of 6.66 on the COCO dataset. The model can also generate images with non-standard aspect ratios, including vertical or horizontal orientations and the standard square aspect.

DeepFloyd IF model's image-to-image translation

In addition to text-to-image generation, DeepFloyd IF offers zero-shot image-to-image translations. This is achieved by resizing the original image to 64 pixels, adding noise through forward diffusion, and using backward diffusion with a new prompt to denoise the image.

The style can be modified through super-resolution modules via a prompt text description. This approach allows for the modification of style, patterns, and details in the output image while maintaining the primary form of the source image without the need for fine-tuning.

Process for generating high-quality images

The DeepFloyd IF model works in three stages to generate high-quality images from text prompts. A frozen T5-XXL language model converts the text prompt into a qualitative representation in the first stage. Then, in the second stage, a base diffusion model is applied to transform the qualitative text into a 64×64 image, which is then upscaled to 256×256 using two text-conditional super-resolution models.

During the third stage of the process, a final model is used to enhance the image to a clear and high-quality 1024×1024 resolution. The IF model includes different versions of the base and super-resolution models, which have other parameters.

Although the third-stage model has yet to be available, alternative upscale models like the Stable Diffusion x4 Upscaler can be utilized.

Stability AI DeepFloyd IF is able to expand the output image to higher resolutions - Image courtesy of Stability AI

Training dataset and licensing

DeepFloyd IF was trained on a high-quality custom dataset called LAION-A, which contains 1 billion (image, text) pairs. The dataset is an aesthetic subset of the English part of the LAION-5B dataset, and the data were filtered using custom filters to remove inappropriate content.

The model is initially released under a research license, and the creators welcome feedback to improve the model’s performance and scalability. The model can be used in various domains, such as art, design, storytelling, virtual reality, and accessibility.

The DeepFloyd IF model offers a promising advancement in the field of text-to-image generation. Its impressive capabilities and potential applications make it a valuable asset for researchers and professionals in various industries.

The model’s availability on a non-commercial, research-permissible license and the creators’ commitment to open-sourcing the model in the future aligns with Stability AI’s goal of sharing innovative technologies with the broader research community.

The creators welcome feedback and public discussions related to the model’s technical, academic, and ethical aspects, which can be accessed through the model’s weights, model card, and code available on GitHub, as well as through the Gradio demo provided for everyone.

Introducing DeepFloyd IF: From text to photorealistic images

DeepFloyd IF model's image-to-image translation

Process for generating high-quality images

Training dataset and licensing

Related content

Tutorials & Tips

How to delete all Google history from every device

The only Starfield performance optimization guide you need

How to fix Disney+ Hotstar Error code: PB_WEB_DR-6007-001_X

How to fix Roblox error code 277: Explained

Comments

Leave a Reply Cancel reply

Advertisement

Spread the Word

Advertisement

Hot Discussions

Advertisement

Recently Updated

Latest from Softonic

Advertisement

About gHacks

Introducing DeepFloyd IF: From text to photorealistic images

DeepFloyd IF model's image-to-image translation

Process for generating high-quality images

Training dataset and licensing

Related content

Valve introduces Steam Families to allow members to share their libraries simultaneously

HP's All-In-Plan will let you rent printers, but it monitors them

Microsoft adds games that you can play without download to the Microsoft Store

The 10 best hidden Google Games that you can play in your browser

Xbox's Auto-Upload feature may get your account banned

MSI unveils Intel-powered Windows gaming handheld Claw

Tutorials & Tips

How to delete all Google history from every device

The only Starfield performance optimization guide you need

How to fix Disney+ Hotstar Error code: PB_WEB_DR-6007-001_X

How to fix Roblox error code 277: Explained

Comments

Leave a Reply Cancel reply

Advertisement

Spread the Word

Advertisement

Hot Discussions

Advertisement

Recently Updated

Latest from Softonic

Advertisement

About gHacks