Task

Source Video	Compressed Video	Future Prediction
source_video.mp4	compressed_video.mp4	generated.mp4

A world model is a model that can predict the next state of the world given the observed previous states and actions.

World models are essential to training all kinds of intelligent agents, especially self-driving models.

commaVQ contains:

encoder/decoder models used to heavily compress driving scenes
a world model trained on 3,000,000 minutes of driving videos
a dataset of 100,000 minutes of compressed driving videos

Task

Lossless compression challenge: make me smaller! $500 challenge

Losslessly compress 5,000 minutes of driving video "tokens". Go to ./compression/ to start

Prize: highest compression rate on 5,000 minutes of driving video (~915MB) - Challenge ends July, 1st 2024 11:59pm AOE

Submit a single zip file containing the compressed data and a python script to decompress it into its original form. Top solutions will be listed on comma's official leaderboard.

Implementation	Compression rate
lzma	1.6

Overview

A VQ-VAE [1,2] was used to heavily compress each video frame into 128 "tokens" of 10 bits each. Each entry of the dataset is a "segment" of compressed driving video, i.e. 1min of frames at 20 FPS. Each file is of shape 1200x8x16 and saved as int16.

A world model [3] was trained to predict the next token given a context of past tokens. This world model is a Generative Pre-trained Transformer (GPT) [4] trained on 3,000,000 minutes of driving videos following a similar recipe to [5].

Examples

./notebooks/encode.ipynb and ./notebooks/decode.ipynb for an example of how to visualize the dataset using a segment of driving video from comma's drive to Taco Bell

./notebooks/gpt.ipynb for an example of how to use the world model to imagine future frames.

./compression/compress.py for an example of how to compress the tokens using lzma

Download the dataset

Using huggingface datasets

import numpy as np
from datasets import load_dataset
num_proc = 40 # CPUs go brrrr
ds = load_dataset('commaai/commavq', num_proc=num_proc)
tokens = np.load(ds['0'][0]['path']) # first segment from the first data shard

Manually download from huggingface datasets repository: https://huggingface.co/datasets/commaai/commavq

References

[1] Van Den Oord, Aaron, and Oriol Vinyals. "Neural discrete representation learning." Advances in neural information processing systems 30 (2017).

[2] Esser, Patrick, Robin Rombach, and Bjorn Ommer. "Taming transformers for high-resolution image synthesis." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021.

[3] https://worldmodels.github.io/

[4] Vaswani, Ashish, et al. "Attention is all you need." Advances in neural information processing systems 30 (2017).

[5] Micheli, Vincent, Eloi Alonso, and François Fleuret. "Transformers are Sample-Efficient World Models." The Eleventh International Conference on Learning Representations. 2022.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
compression		compression
examples		examples
gpt2m @ 12f0a5e		gpt2m @ 12f0a5e
nanogpt		nanogpt
notebooks		notebooks
utils		utils
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Task

Lossless compression challenge: make me smaller! $500 challenge

Overview

Examples

Download the dataset

References

About

Releases

Packages

Languages

License

shivenmian/commavq

Folders and files

Latest commit

History

Repository files navigation

Task

Lossless compression challenge: make me smaller! $500 challenge

Overview

Examples

Download the dataset

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages