Skip to content

Dataset for our paper Language-Conditioned Imitation Learning for Robot Manipulation Tasks

License

Notifications You must be signed in to change notification settings

sstepput/LanguagePoliciesDataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Language-Conditioned Imitation Learning (Dataset)

This dataset accompanies the paper "Language-Conditioned Imitation Learning for Robot Manipulation Tasks" that was published as a spotlight paper at NeurIPS 2020. Utilizing this dataset for policy training and evaluation requires the code provided in our official GitHub.

For transferability purposes, we published the dataset as a separate entity and provided a small set of helpers to explore the data. The dataset generally contains 44,890 collected samples grouped into 22,445 picking and 22,445 matching pouring tasks that directly follow their respective picking task.

Model figure

The above image demonstrates two samples for the tasks of "Raise the red cup" and "Fill a little into the large green bowl". Each sample has an associated language instruction that describes what to do (i.e., picking or pouring), where to do it (referencing one of our twelve objects), and most importantly, in the case of a pouring task how to do it (describing whether to pour a little or a lot). Thus, in this dataset, language is not just used for goal conditioning but also to describe the motion itself needed to complete the task.

When using this dataset, we would apprechiate the following citation:

@inproceedings{NEURIPS2020_9909794d,
 author = {Stepputtis, Simon and Campbell, Joseph and Phielipp, Mariano and Lee, Stefan and Baral, Chitta and Ben Amor, Heni},
 booktitle = {Advances in Neural Information Processing Systems},
 editor = {H. Larochelle and M. Ranzato and R. Hadsell and M.F. Balcan and H. Lin},
 pages = {13139--13150},
 publisher = {Curran Associates, Inc.},
 title = {Language-Conditioned Imitation Learning for Robot Manipulation Tasks},
 url = {https://proceedings.neurips.cc/paper_files/paper/2020/file/9909794d52985cbc5d95c26e31125d1a-Paper.pdf},
 volume = {33},
 year = {2020}
}

Dataset Initialization

We provide our dataset as a compressed tar.gz archive that can be downloaded here from Google Drive (~7GB Download)

Place the downloaded file in the root directory of this repository and extract it as follows:

tar xvf data_raw.tar.gz

N0TE: The extracted data will take ~140GB of storage space due to being human-readable JSON.

Task Description

Our task is set up as a tabletop manipulation setting in which a single UR5 robot arm is tasked with picking one of three differently colored cups and then pouring a specified amount of the grasped cup's content into one of our 20 bowl. Below, we describe each cup and bowl with respect to their visual features:

ID Type Color Size Shape
1 cup red n/a n/a
2 cup green n/a n/a
3 cup blue n/a n/a
ID Type Color Size Shape
1 bowl yellow small round
2 bowl red small round
3 bowl green small round
4 bowl blue small round
5 bowl pink small round
6 bowl yellow large round
7 bowl red large round
8 bowl green large round
9 bowl blue large round
10 bowl pink large round
11 bowl yellow small square
12 bowl red small square
13 bowl green small square
14 bowl blue small square
15 bowl pink small square
16 bowl yellow large square
17 bowl red large square
18 bowl green large square
19 bowl blue large square
20 bowl pink large square

Dataset Structure

All data samples can be found in the "/raw" folder; however, we provide a single example in the "/example" to provide a brief overview. Each file is in standard JSON format that any such parser should be able to read.

Each sample has a unique ID is composed of a picking and subsequent pouring action. In the data folder, there are two files with the same ID, followed by _1 and and _2, where _1 contains the picking and _2 contains the corresponding pouring action.

N0TE: The data in trajectory is not used to train our models! This data is only used to initially control the robot during data collection in lieu of kinesthetic teaching or teleportation. Do not use this data for your own model training.

The fields in the JSON file are as follows:

  • amount: Describing the amount poured. 110 degrees for some and 180 for all. While available in the picking task, the value is only used in the pouring task.
  • target/id: The id of the target object. See the table in Task Description
  • target/type: Type of the target, either cup or bowl. See the table in Task Description
  • trajectory: The initially generated trajectory that is executed by the robot during data collection. This trajectory has been automatically generated to fulfill the desired task. Motions are generated as a set of waypoints, converted to joint configurations with an inverse kinematics, and finally linearly sampled to fill-in trajectory.
  • name: The name of this demonstration. Overall, there will be two files with the same name. One for the pocking, one for the pouring action.
  • phase: Either 0 or 1, where 0 indicates the picking, and 1 indicates the pouring action. Note that the file names are using extensions _1 for picking and _2 for pouring.
  • image: An array containing the top-down image of the environment in uint8 format. The image is of size (320, 569, 3) in BGR format (for visualization with matplotlib, you will need to flip the last axis).
  • ints: Describes how many and which cups and bowls are in the environment. Index 0 holds the number of bowls, index 1 the number of cups, followed by the bowl and cup IDs used. See the table in Task Description for the ids.
  • floats: For each bowl and cup, there are three values. The first two describe the x/y position of the object in the robot coordinate frame, while the third value describes its rotation around the z-axis.
  • state/raw: Holds the raw robot state recorded during data collection when executing the trajectory given in trajectroy. This data is used for training our models. The values are as follows:
    • 6x robot joint position (j1, j2, j3, j4, j5, j6)
    • 6x robot joint velocity (j1, j2, j3, j4, j5, j6) (Not Used)
    • 3x robot tool-center-point position (x, y, z) (Not Used)
    • 3x robot tool-center-point rotation (x, y, z) (Not Used)
    • 3x robot tool-center-point linear velocity (x, y, z) (Not Used)
    • 3x robot tool-center-point angular velocity (x, y, z) (Not Used)
    • 3x robot tool-center-point target position (x, y, z) (Not Used)
    • 3x robot tool-center-point target rotation (x, y, z) (Not Used)
    • 1x gripper position
    • 1x gripper joint velocity (Not Used)
  • state/dict: Same as state/raw, but as a parsed dictionary. See state/raw for descriptions
  • voice: The voice command used for this demonstration

Exploring the Data

We provide a small tool to visualize the data. Without extracting the full dataset, you can utilize the single example provided in the /sample folder. To explore a sample, run the following code

python explore_sample.py --sample ./sample/0grm2RA9ZnK

This will plot the robot trajectory (state/raw used for training), instruction, and render the image used to initialize the task.

About

Dataset for our paper Language-Conditioned Imitation Learning for Robot Manipulation Tasks

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages