VDebugger: Harnessing Execution Feedback for Debugging Visual Programs
Paper, Website, Models and Data
- Environment Setup
- Dataset Setup
- Generation and Execution of Visual Programs
- Inference of VDebugger
- Error Injection
This code is partially adapted from ViperGPT. We sincerely thank the authors of ViperGPT for their great work!
To setup the environment, you should:
- Clone recursively:
git clone --recurse-submodules [email protected]:shirley-wu/vdebugger.git
- Install pytorch based on your own environment. We installed
torch==2.1.2
with cuda 12.1 - Install dependencies:
pip install -r requirements.txt
- Setup ViperGPT environments by:
cd viper
bash download_models.sh
export PATH=/usr/local/cuda/bin:$PATH
cd GLIP
python setup.py clean --all build develop --user
- If you need to use openai APIs: write api key into
viper/api.key
Please follow the guidelines below to download each dataset:
- GQA: https://cs.stanford.edu/people/dorarad/gqa/download.html. The file structure should look as follows:
gqa/
├── questions/
│ ├── readme.txt
│ ├── {val, test, testdev, challenge}_{all, balanced}_questions.json
│ ├── submission_all_questions.json
│ ├── train_balanced_questions.json
│ ├── train_all_questions/
└── images/
└── *.jpg
- TallyQA: https://github.com/manoja328/TallyQA_dataset. The file structure should look as follows:
tallyqa/
├── {test, train}.json
└── {train2014, val2014, VG_100K, VG_100K_2}/
└── *.jpg
- NLVRv2: https://github.com/lil-lab/nlvr/tree/master/nlvr2. The file structure should look as follows:
nlvr2/
├── balanced_{dev, test1, test2, train}.jsonl
└── {dev, test1, test2, train}/
└── *.png
- RefCOCO*: https://github.com/lichengunc/refer. The file structure should look as follows:
refer/
├── refcoco/
│ ├── instances.json
│ ├── refs(google).p
│ └── refs(unc).p
├── refcoco+/
│ ├── instances.json
│ └── refs(unc).p
├── refcocog/
│ ├── instances.json
│ ├── refs(google).p
│ └── refs(umd).p
└── {train2014, train2017, val2014, val2017}/
└── *.jpg
- COVR: https://covr-dataset.github.io/. The file structure should look as follows:
covr/
├── {train, val, test}.jsonl
├── gqa_images/
│ └── *.jpg
└── imSitu_images/
└── {adjusting, ...}/
└── *.jpg
- RSVG: https://github.com/ZhanYang-nwpu/RSVG-pytorch. The file structure should look as follows:
rsvg/
├── {train, val, test.txt}
├── Annotations/
│ └── *.xml
└── JPEGImages/
└── *.jpg
Go to viper/
for this step. We recommend first generating and then executing the visual programs in two separate steps. Take GQA dataset as an example:
- Generate programs:
CONFIG_NAMES=generate/gqa python main_batch_generate.py
This script will load the configuration under config/generate/gqa.yaml
. Please remember to change YOUR_DATA_DIR into your data directory. The generated code will be saved in a csv under code
field
2. Execute and evaluate programs:
CONFIG_NAMES=execute/gqa python main_batch_execute.py
This script will load the configuration under config/execute/gqa.yaml
. Please also remember to update YOUR_DATA_DIR, and change the cached_codex_path:
field into the csv produced in step 1. The accuracy / IoU will be computed.
3. If you want to obtain execution feedback:
CONFIG_NAMES=execute/gqa python main_batch_trace.py A_RANDOM_STAMP
You can use the same configuration as in step 2. If you want to run multiple main_batch_trace.py
in the same time, please use different A_RANDOM_STAMP
for different processes. The execution feedback will be saved in a csv under traced
field.
For inference with VDebugger, it is required to first generate and execute visual programs, and obtain a csv file containing traced
field. Then, go to vdebugger/
. Take GQA dataset and VDebugger/VDebugger-{critic, refiner}-generalist-13B as an example:
# Step 1: infer critic
python infer_critic.py VDebugger/VDebugger-critic-generalist-13B --input YOUR_CSV_CONTAINING_TRACED_FIELD --dataset gqa # output file will be written to critic-infer.csv
# Step 2: infer refiner
python infer_refine.py critic-infer.csv VDebugger/VDebugger-refiner-generalist-13B # output file will be written to critic-refine-infer.csv
Then you can execute the programs in critic-refine-infer.csv
as in step 2 of Generation and Execution of Visual Programs
If you want to reproduce our training of VDebugger, please use vdebugger/training_scripts/train_{critic, refiner}.sh
. You will need to install deepspeed==0.14.0
.
To perform error injection and generate incorrect programs as described in Section 4 of our paper, you first need a .csv
file containing the visual programs generated for the training set and their execution results. Then, please go to vdebugger/
and run:
python error_injection.py YOUR_CSV_FILE --error_injection {greedy, mask-best}
Please cite our paper if this repository inspires your work.
@misc{wu2024vdebugger,
title={VDebugger: Harnessing Execution Feedback for Debugging Visual Programs},
author={Xueqing Wu and Zongyu Lin and Songyan Zhao and Te-Lin Wu and Pan Lu and Nanyun Peng and Kai-Wei Chang},
year={2024}
}