- Release environment setting
- Release open-pose benchmark datasets McGill‡
- Release datasets ModelNet40‡, ModelNet10‡,
- Release our baseline eval code CLIP-Based
- Release our baseline eval code Diffusion-Based
Datasets | Total Classes | Seen/Unseen Classes | Train/Valid/Test Samples | Download |
---|---|---|---|---|
ModelNet40‡ | 40 | 30/- | 5852/1560/- | google driver |
ModelNet10‡ | 10 | -/10 | -/-/908 | google driver |
McGill‡ | 19 | -/14 | -/-/115 | google driver |
Our baseline (Diffusion-based or CLIP-based) could be conducted on one single RTX3090 or RTX4090.
conda env create -f op3dzsl.yaml
conda activtae op3dzsl
pip install git+https://github.com/openai/CLIP.git
Download the Diffusion pretrained model google driver or official website. Rename the pretrained model as "model.ckpt" and put it in the directory "models/ldm/stable-diffusion-v1/".
For our CLIP-Based baseline
python baseline_eval/clip_eval.py
For our Diffusion-Based baseline
python baseline_eval/diffusion_eval.py
If you find this work useful in your research, please cite:
@article{zhao2023diff,
title={Open-Pose 3D Zero-Shot Learning: Benchmark and Challenges},
author={Zhao, Weiguang and Yang, Guanyu and Zhang, Rui and Jiang, Chenru and Yang, Chaolong and Yan, Yuyao and Hussain, Amir and Huang, Kaizhu},
journal={arXiv preprint arXiv:2312.07039},
year={2023}
}
If you utilize our open-pose datasets, it is necessary to cite the previous works from which they were developed: ModelNet40 and McGill.
@inproceedings{ModelNet,
title={3d shapenets: A deep representation for volumetric shapes},
author={Wu, Zhirong and Song, Shuran and Khosla, Aditya and Yu, Fisher and Zhang, Linguang and Tang, Xiaoou and Xiao, Jianxiong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={1912--1920},
year={2015}
}
@article{McGill,
title={Retrieving articulated 3-D models using medial surfaces},
author={Siddiqi, Kaleem and Zhang, Juan and Macrini, Diego and Shokoufandeh, Ali and Bouix, Sylvain and Dickinson, Sven},
journal={Machine Vision and Application},
volume={19},
pages={261--275},
year={2008}
}
This project is not possible without multiple great opensourced codebases. We list some notable examples: TZSL, PointCLIP, PointCLIPv2, ReConCLIP, CLIP2Point, ULIP, DiffCLIP, Stable-Diffusion etc.