Skip to content

Latest commit

 

History

History
63 lines (48 loc) · 3.55 KB

lhq.md

File metadata and controls

63 lines (48 loc) · 3.55 KB

Landscapes HQ dataset

Note: images are sorted by their likelihood. That's why images with smaller idx are much more noisy. We will release a filtered version soon.

We collected 90 000 high-resolution landscape images from Unsplash (64,669 images) and Flickr (25,331 images).

Path Size Number of files Format Description
Landscapes HQ 283G 90,000 PNG The root directory with all the files
├  LHQ 155G 90,000 PNG The complete dataset. Split into 4 zip archives.
├  LHQ1024 107G 90,000 PNG LHQ images, resized to min-side=1024 and center-cropped to 1024x1024. Split into 3 zip archives.
├  LHQ1024_jpg 12G 90,000 JPG LHQ1024 converted to JPG format with quality=95 (with Pillow)*
├  LHQ256 8.7G 90,000 PNG LHQ1024 resized to 256x256 with Lanczos interpolation
└  metadata.json 27M 1 JSON Dataset metadata (author names, licenses, descriptions, etc.)

*quality=95 in Pillow for JPG images (the default one is 75) provides images almost indistinguishable from PNG ones both visually and in terms of FID.

25 random images from LHQ

Downloading files:

python download_lhq.py [DATASET_NAME]

License

https://creativecommons.org/publicdomain/zero/1.0/ http://www.usa.gov/copyright.shtml

The individual images from LHQ dataset have one of the following licenses:

To see, which image has which license, please see the corresponding metadata.

The dataset itself is published under Creative Commons Attribution 2.0 Generic (CC BY 2.0) License: https://creativecommons.org/licenses/by/2.0/. This means, that you can use it however you like, but you should attribute the source (i.e. give a link to this repo or cite the paper).

Dataset collection

LHQ keywords Word Map

Images were obtained by downloading 450k images from Unsplash and Flickr using a set of 400 manually constructed search queries and preprocessing it with a pretrained Mask R-CNN to filter out images that likely contained objects and Inception V3 statistics to remove "too out-of-distribution" samples. For more information, see Section 4 of the paper: https://arxiv.org/abs/2104.06954

BibTeX

@article{ALIS,
  title={Aligning Latent and Image Spaces to Connect the Unconnectable},
  author={Skorokhodov, Ivan and Sotnikov, Grigorii and Elhoseiny, Mohamed},
  journal={arXiv preprint arXiv:2104.06954},
  year={2021}
}