A rusty robot holding a fire torch, generated by stable diffusion using Rust and libtorch.
The diffusers crate is a Rust equivalent to Huggingface’s amazing
diffusers Python library.
It is based on the tch crate.
The implementation supports running Stable Diffusion v1.5 and v2.1.
Getting the weights
The weight files can be retrieved from the HuggingFace model repos and should be
moved in the data/ directory.
For Stable Diffusion v2.1, get the bpe_simple_vocab_16e6.txt,
clip_v2.1.safetensors, unet_v2.1.safetensors, and vae_v2.1.safetensors
files from the
v2.1 repo.
For Stable Diffusion v1.5, get the bpe_simple_vocab_16e6.txt,
pytorch_model.safetensors, unet.safetensors, and vae.safetensors
files from this
v1.5 repo.
Alternatively, you can run the following python script.
# Add --sd_version 1.5 to get the v1.5 weights rather than the v2.1.
python3 ./scripts/get_weights.py
Running some example.
cargo run --example stable-diffusion --features clap -- --prompt "A rusty robot holding a fire torch."
The final image is named sd_final.png by default.
The default scheduler is the Denoising Diffusion Implicit Model scheduler (DDIM). The
original paper and some code can be found in the associated repo.
This generates some images of rusty robots holding some torches!
Image to Image Pipeline
The stable diffusion model can also be used to generate an image based on
another image. The following command runs this image to image pipeline:
cargo run --example stable-diffusion-img2img --features clap -- --input-image media/in_img2img.jpg
The default prompt is “A fantasy landscape, trending on artstation.”, but can
be changed via the -prompt flag.
Inpainting Pipeline
Inpainting can be used to modify an existing image based on a prompt and modifying the part of the
initial image specified by a mask.
This requires different unet weights unet-inpaint.safetensors that could also be retrieved from this
repo and should also be
placed in the data/ directory.
The following command runs this image to image pipeline:
The default prompt is “Face of a yellow cat, high resolution, sitting on a park bench.”, but can
be changed via the -prompt flag.
ControlNet Pipeline
The ControlNet architecture can be
used to control how stable diffusion generate images. This is to be used with
the weights for stable diffusion 1.5 (see how to get these above). Additional
weights have to be retrieved from this HuggingFace
repo
and copied in data/controlnet.safetensors.
The ControlNet pipeline takes as input a sample image, in the default mode it
will perform edge detection on this image using the Canny edge
detector and will use the
resulting edge image as a guide.
cargo run --example controlnet --features clap,image,imageproc -- \
--prompt "a rusty robot, lit by a fire torch, hd, very detailed" \
--input-image media/vermeer.jpg
The media/vermeer.jpg image is the well known painting on the left hand side,
this results in the right hand side image after performing edge detection.
Using only the edge detection image, the ControlNet model generate the following
samples.
FAQ
Memory Issues
This requires a GPU with more than 8GB of memory, as a fallback the CPU version can be used
but is slower.
cargo run --example stable-diffusion --features clap -- --prompt "A very rusty robot holding a fire torch." --cpu all
diffusers-rs: A Diffusers API in Rust/Torch
A rusty robot holding a fire torch, generated by stable diffusion using Rust and libtorch.
The
diffusers
crate is a Rust equivalent to Huggingface’s amazing diffusers Python library. It is based on the tch crate. The implementation supports running Stable Diffusion v1.5 and v2.1.Getting the weights
The weight files can be retrieved from the HuggingFace model repos and should be moved in the
data/
directory.bpe_simple_vocab_16e6.txt
,clip_v2.1.safetensors
,unet_v2.1.safetensors
, andvae_v2.1.safetensors
files from the v2.1 repo.bpe_simple_vocab_16e6.txt
,pytorch_model.safetensors
,unet.safetensors
, andvae.safetensors
files from this v1.5 repo.Running some example.
The final image is named
sd_final.png
by default. The default scheduler is the Denoising Diffusion Implicit Model scheduler (DDIM). The original paper and some code can be found in the associated repo.This generates some images of rusty robots holding some torches!
Image to Image Pipeline
The stable diffusion model can also be used to generate an image based on another image. The following command runs this image to image pipeline:
The default prompt is “A fantasy landscape, trending on artstation.”, but can be changed via the
-prompt
flag.Inpainting Pipeline
Inpainting can be used to modify an existing image based on a prompt and modifying the part of the initial image specified by a mask. This requires different unet weights
unet-inpaint.safetensors
that could also be retrieved from this repo and should also be placed in thedata/
directory.The following command runs this image to image pipeline:
The default prompt is “Face of a yellow cat, high resolution, sitting on a park bench.”, but can be changed via the
-prompt
flag.ControlNet Pipeline
The ControlNet architecture can be used to control how stable diffusion generate images. This is to be used with the weights for stable diffusion 1.5 (see how to get these above). Additional weights have to be retrieved from this HuggingFace repo and copied in
data/controlnet.safetensors
.The ControlNet pipeline takes as input a sample image, in the default mode it will perform edge detection on this image using the Canny edge detector and will use the resulting edge image as a guide.
The
media/vermeer.jpg
image is the well known painting on the left hand side, this results in the right hand side image after performing edge detection.Using only the edge detection image, the ControlNet model generate the following samples.
FAQ
Memory Issues
This requires a GPU with more than 8GB of memory, as a fallback the CPU version can be used but is slower.
For a GPU with 8GB, one can use the fp16 weights for the UNet and put only the UNet on the GPU.