Landscapes HQ dataset

Note: images are sorted by their likelihood. That's why images with smaller idx are much more noisy. We will release a filtered version soon.

We collected 90 000 high-resolution landscape images from Unsplash (64,669 images) and Flickr (25,331 images).

Path	Size	Number of files	Format	Description
Landscapes HQ	283G	90,000	PNG	The root directory with all the files
├ LHQ	155G	90,000	PNG	The complete dataset. Split into 4 zip archives.
├ LHQ1024	107G	90,000	PNG	LHQ images, resized to min-side=1024 and center-cropped to 1024x1024. Split into 3 zip archives.
├ LHQ1024_jpg	12G	90,000	JPG	LHQ1024 converted to JPG format with `quality=95` (with Pillow)*
├ LHQ256	8.7G	90,000	PNG	LHQ1024 resized to 256x256 with Lanczos interpolation
└ metadata.json	27M	1	JSON	Dataset metadata (author names, licenses, descriptions, etc.)

*quality=95 in Pillow for JPG images (the default one is 75) provides images almost indistinguishable from PNG ones both visually and in terms of FID.

Downloading files:

python download_lhq.py [DATASET_NAME]

License

https://creativecommons.org/publicdomain/zero/1.0/ http://www.usa.gov/copyright.shtml

The individual images from LHQ dataset have one of the following licenses:

Unsplash License
Creative Commons Attribution License
Creative Commons Attribution-NonCommercial License
Creative Commons Public Domain Mark
Creative Commons Public Domain Dedication (CC0)
United States Government Work

To see, which image has which license, please see the corresponding metadata.

The dataset itself is published under Creative Commons Attribution 2.0 Generic (CC BY 2.0) License: https://creativecommons.org/licenses/by/2.0/. This means, that you can use it however you like, but you should attribute the source (i.e. give a link to this repo or cite the paper).

Dataset collection

Images were obtained by downloading 450k images from Unsplash and Flickr using a set of 400 manually constructed search queries and preprocessing it with a pretrained Mask R-CNN to filter out images that likely contained objects and Inception V3 statistics to remove "too out-of-distribution" samples. For more information, see Section 4 of the paper: https://arxiv.org/abs/2104.06954

BibTeX

@article{ALIS,
  title={Aligning Latent and Image Spaces to Connect the Unconnectable},
  author={Skorokhodov, Ivan and Sotnikov, Grigorii and Elhoseiny, Mohamed},
  journal={arXiv preprint arXiv:2104.06954},
  year={2021}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lhq.md

lhq.md

Landscapes HQ dataset

License

Dataset collection

BibTeX

Files

lhq.md

Latest commit

History

lhq.md

File metadata and controls

Landscapes HQ dataset

License

Dataset collection

BibTeX