Source code for project report "On The Robustness of Diffusion-Based Text-to-Image Generation" in CV-2022-Fall.
Members: Liang Chen, Zhe Yang, Zheng Li
cd ModelTraining
Before doing this experiment, please download images of MSCOCO 2017 dataset from https://cocodataset.org/#download
We use the same environment as stable-diffusion https://github.com/CompVis/stable-diffusion
A suitable conda environment named ldm
can be created and activated with:
conda env create -f environment.yaml
conda activate ldm
Specify which GPU (or GPUs) you want to use to train the model with:
accelerate config
Set proper hyper-parameters in tune.sh
. Do remember to set train_data_dir to the directory of your training set !
Training the model with:
bash tune.sh
If you want to use text augment methods like back translation,text crop and swap, you can add this augment to tune.sh
:
--text_augment="bt" or --text_augment="crop_swap"
If you want to use text interpolation augment method to train the model, you can first run python encode_text.py
to generate interpolation text vectors, and then add --text_embed_dir="./text_embed_linear_p_beta1_n5.bin"
to tune.sh
After training, we can generate images with trained model under the control of the texts in test set. You can generate images with:
bash generate.sh
Before generating images, set proper hyper-parameters in generate.sh
: --model_name
is the name of directory of your trained model; --output_dir
is the directory of generated images.
cd Interpolation
Hidden States Interpolation
Note that this method needs the hidden states of text after clip encoder.
python HiddenStatesInterpolation.py
we provide a jupyter notebook, please run Interpolation.ipynb
cd RobustnessAnalysis
- The first similarity "Image Similarity among random seeds " is in "Similarity_with_seed.ipynb", and it's similarity between the images generated from the same texts but with different seeds.
- The second similarity "Similarity2: within similar texts" is in "similarity.ipynb", and its Chinese name is"组内相似度".
- The third similarity "Faithfulness : between image and text" is in "text_img_similarity.ipynb", and it's similarity between the images and the texts.