Skip to content

Latest commit

 

History

History
 
 

stanford_mask_vit

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 

This dataset accompanies the paper MaskViT: Masked Visual Pre-Training for Video Prediction.

The raw data before preprocessing can be found here: https://drive.google.com/file/d/1olPNo1p-XIcLcwbifC0Z71pQijP7eY6h/view?usp=drive_link.

At a high level, this is "RoboNet-style" data in that it is random interaction data collected by a robot arm and a bin of objects, which are swapped out periodically. The objects mostly consist of soft stuffed toys and plastic objects. The robot arm is controlled by a random policy with the autograsp primitive enabled. It is collected using the Visual Foresight codebase.

The dataset is collected using a single robot, and was collected in two parts of roughly equal size. These two parts have different camera configurations and slightly different object distributions.

The dataset contains 9109 train episodes and 91 validation episodes. Each episode contains 30 steps.

Example trajectories:

Example images Example images Example images