This repository presents the virtual try-on dataset proposed in:
D. Morelli, M. Fincato, M. Cornia, F. Landi, F. Cesari, R. Cucchiara
Dress Code: High-Resolution Multi-Category Virtual Try-On
Under Review
[Paper] [Dataset Request Form]
By making any use of the Dress Code Dataset, you accept and agree to comply with the terms and conditions reported here.
We collected a new dataset for image-based virtual try-on composed of image pairs coming from different catalogs of YOOX NET-A-PORTER Group.
The dataset contains more than 50k high resolution model clothing images pairs divided into three different categories (i.e. dresses, upper-body clothes, lower-body clothes).
- 53792 garments
- 107584 images
- 3 categories
- upper body
- lower body
- dresses
- 1024 x 768 image resolution
- additional infos
- keypoints
- label_maps
- skeletons
- DensePose
Along with model and garment image pair, we provide also the keypoints, skeleton, image label map, and densePose.
More info
For all image pairs of the dataset, we stored the joint coordinates of human poses. In particular, we used OpenPose [1] to extract 18 keypoints for each human body.
For each image, we provided a json file containing a dictionary with the keypoints
key.
The value of this key is a list of 18 elements, representing the joints of the human body. Each element is a list of 4 values, where the first two indicate the coordinates on the x and y axis respectively.
Skeletons are RGB images obtained connecting keypoints with lines.
We employed a human parser to assign each pixel of the image to a specific category thus obtaining a segmentation mask for each target model. Specifically, we used the SCHP model [2] trained on the ATR dataset, a large single person human parsing dataset focused on fashion images with 18 classes.
Obtained images are composed of 1 channel filled with the category label value. Categories are mapped as follows:
0 background
1 hat
2 hair
3 sunglasses
4 upper_clothes
5 skirt
6 pants
7 dress
8 belt
9 left_shoe
10 right_shoe
11 head
12 left_leg
13 right_leg
14 left_arm
15 right_arm
16 bag
17 scarf
We also extracted dense label and UV mapping from all the model images using DensePose [3].
Name | SSIM | FID | KID |
---|---|---|---|
CP-VTON [4] | 0.803 | 35.16 | 2.245 |
CP-VTON+ [5] | 0.902 | 25.19 | 1.586 |
CP-VTON' [4] | 0.874 | 18.99 | 1.117 |
PFAFN [6] | 0.902 | 14.38 | 0.743 |
VITON-GT [7] | 0.899 | 13.80 | 0.711 |
WUTON [8] | 0.902 | 13.28 | 0.771 |
ACGPN [9] | 0.868 | 13.79 | 0.818 |
OURS | 0.906 | 11.40 | 0.570 |
[1] Cao, et al. "OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields." IEEE TPAMI, 2019.
[2] Li, et al. "Self-Correction for Human Parsing." arXiv, 2019.
[3] Güler, et al. "Densepose: Dense human pose estimation in the wild." CVPR, 2018.
[4] Simonyan and Zisserman. "Very deep convolutional networks for large-scale image recognition." ICLR, 2015.
[5] Minar, et al. "CP-VTON+: Clothing Shape and Texture Preserving Image-Based Virtual Try-On." CVPR Workshops, 2020.
[6] Ge, et al. "Parser-Free Virtual Try-On via Distilling Appearance Flows." CVPR, 2021.
[7] Fincato, et al. "VITON-GT: An Image-based Virtual Try-On Model with Geometric Transformations." ICPR, 2020.
[8] Issenhuth, el al. "Do Not Mask What You Do Not Need to Mask: a Parser-Free Virtual Try-On." ECCV, 2020.
[9] Yang, et al. "Towards Photo-Realistic Virtual Try-On by Adaptively Generating-Preserving Image Content." CVPR, 2020.
If you have any general doubt about our dataset, please use the public issues section on this github repo. Alternatively, drop us an e-mail at davide.morelli [at] unimore.it or marcella.cornia [at] unimore.it.