Jittor 底层开发人员战队 Guided Dora

简介

本方法由三大部分构成：

改进 LoRA 为 DoRA
基于最优化控制的引导
提示词工程的语义与风格的解耦

DoRA 部分

请参考原论文

基于最优控制的引导：

扩散模型由两个随机过程组成：

(a) 噪声添加过程，由一个被称为前向SDE的随机微分方程(SDE)建模：

dX_t = f(X_t,t) dt + g(X_t,t) dW_t, X_0 \sim p_0,

(b) 去噪过程，由前向SDE的时间反转在轻微正则化条件下建模，也称为反向SDE：

dX_t = \left[f(X_t,t) - g^2(X_t,t)\nabla \log p(X_t,t)\right] dt + g(X_t,t) dW_t, \quad X_1 \sim \mathcal{N}(0,I_d). \tag{1}

dX_t = \left[f(X_t,t) - \frac{1}{2}g^2(X_t,t)\nabla \log p(X_t,t)\right] dt, \quad X_1 \sim \mathcal{N}(0,I_d).

\min_{u\in\mathcal{U}} \mathbb{E}\left[ \int_1^0 \ell(X_t^u, u(X_t^u, t), t) dt + \gamma h(X_0^u) \right], \quad \text{其中} \tag{2}

dX_t^u = \left[f(X_t^u,t) - g^2(X_t^u,t)\nabla \log p(X_t^u,t) + u(X_t^u,t)\right] dt + g(X_t^u,t)dW_t, X_1^u \sim \mathcal{N}(0,\mathrm{I}_d).

\min_{u\in\mathcal{U}} \|\Psi(X_0^f) - \Psi(\mathbb{E}[X_0^u|X_1^u])\|_2^2. \tag{3}

效果图

引导苹果

不使用引导苹果

环境配置与安装

遵循 jDiffusion 中关于 Dreambooth 的示例（即 baseline）进行配置。随后根据本仓库提供的 JDiffusion\models 中的文件，替换掉原始版本中的对应两个文件，以便顺利使用引导。

数据预处理

有一个风格的原始训练图片文件名带有pixel_art，需要重命名删除该字段，确保训练用文件文件名不包含风格信息

训练步骤

注意事项

需要搭配 VSCode 使用。
如果想要使用命令行，请参考 run.txt 文件的指令参数。

文件配置

新建一个 .vscode 文件夹。
将训练文件夹下的 run.txt 文件新建为 launch.json 文件，并放到 .vscode 文件夹中。

风格训练

训练文件夹中的：

train_dora_04_B 对应风格 4
train_dora_08_B 对应风格 8
train_dora_16_B 对应风格 16
train_dora_25_B 对应风格 25

这几个风格需要单独训练。

训练提示词设定

风格	属性词
风格 19	“frontal pixel animal” of xxx
风格 20	“concept” painting of xxx
风格 21	“paper” painting of xxx
风格 22	“instrument” painting of xxx
风格 23	“object” painting of xxx
风格 24	painting of xxx
风格 25	“pixel” plant(animal object) of xxx（需去掉 pixel art）
风格 26	painting of xxx
风格 12	“voxel” painting of xxx
风格 13	“ink” painting of xxx
风格 14	“animal activity” of xxx
风格 15	“person” of xxx
风格 16	“pixel” animal(object) of xxx
风格 10, 09, 07, 06, 05	painting of xxx
风格 08	“paper” animal(object) of xxx
风格 04	painting of xxx by style_04
其余	“paper” painting of xxx

launch.json 参数设定

根据以上提示词设定 launch.json 的两个参数：

attribute_prompt：不同风格的提示词
without_painting：是否带上该参数

示例配置

风格 04
- attribute_prompt: ""
- without_painting: 不带
- 使用单独的训练文件
风格 16
- attribute_prompt: "pixel"
- without_painting: 带上
- 使用单独的训练文件
风格 08
- attribute_prompt: "paper"
- without_painting: 带上
- 使用单独的训练文件
风格 15
- attribute_prompt: "person"
- without_painting: 带上
- 使用通用的训练文件
风格 25
- attribute_prompt: "pixel"
- without_painting: 带上
- 使用单独的训练文件

生成步骤

修改 run_customedLora_modulation.py 文件：
- 第 8 行：数据路径
- 第 172 行：权重路径
运行修改后的脚本以完成训练和生成。

参考文献

[1] Rombach R, Blattmann A, Lorenz D, et al. High-resolution image synthesis with latent diffusion models[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022: 10684-10695.
[2] Ruiz N, Li Y, Jampani V, et al. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2023: 22500-22510.
[3] Song J, Meng C, Ermon S. Denoising Diffusion Implicit Models[C]//International Conference on Learning Representations.
[4] Liu S, Wang C Y, Yin H, et al. DoRA: Weight-Decomposed Low-Rank Adaptation[C]//Forty-first International Conference on Machine Learning.
[5] Wang H, Wang Q, Bai X, et al. Instantstyle: Free lunch towards style-preserving in text-to-image generation[J]. arXiv preprint arXiv:2404.02733, 2024.
[6] Rout L, Chen Y, Ruiz N, et al. RB-Modulation: Training-Free Personalization of Diffusion Models using Stochastic Optimal Control[J]. arXiv preprint arXiv:2405.17401, 2024.

联系方式

QQ：2468888866