DeCap: Decoding CLIP Latents for Zero-Shot Captioning via Text-Only Training
Published at ICLR 2023
Paper link: DeCap
Download coco_train to data
.
Download cc3m_train to data
.
./train_coco.sh
or
./train_cc3m.sh
See inference_decap.ipynb
.
@inproceedings{
li2023decap,
title={DeCap: Decoding {CLIP} Latents for Zero-Shot Captioning via Text-Only Training},
author={Wei Li and Linchao Zhu and Longyin Wen and Yi Yang},
booktitle={The Eleventh International Conference on Learning Representations },
year={2023},
url={https://openreview.net/forum?id=Lt8bMlhiwx2}
}
This repository is heavily based on ClipCap. For training we used the data of COCO dataset and Conceptual Captions.