Skip to content
/ MDIN Public

[MM2024 Oral] 3D-GRES: Generalized 3D Referring Expression Segmentation

Notifications You must be signed in to change notification settings

sosppxo/MDIN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

3D-GRES: Generalized 3D Referring Expression Segmentation

PyTorch Python

🔗[arXiv]📄[PDF]

NEWS:🔥3D-GRES is accepted at ACM MM 2024 (Oral)!🔥

Changli Wu, Yihang Liu, Jiayi Ji, Yiwei Ma, Haowei Wang, Gen Luo, Henghui Ding, Xiaoshuai Sun, Rongrong Ji

Introduction

3D Referring Expression Segmentation (3D-RES) is dedicated to segmenting a specific instance within a 3D space based on a natural language description.However, current approaches are limited to segmenting a single target, restricting the versatility of the task. To overcome this limitation, we introduce Generalized 3D Referring Expression Segmentation (3D-GRES), which extends the capability to segment any number of instances based on natural language instructions.In addressing this broader task, we propose the Multi-Query Decoupled Interaction Network (MDIN), designed to break down multi-object segmentation tasks into simpler, individual segmentations.MDIN comprises two fundamental components: Text-driven Sparse Queries (TSQ) and Multi-object Decoupling Optimization (MDO). TSQ generates sparse point cloud features distributed over key targets as the initialization for queries. Meanwhile, MDO is tasked with assigning each target in multi-object scenarios to different queries while maintaining their semantic consistency. To adapt to this new task, we build a new dataset, namely Multi3DRes. Our comprehensive evaluations on this dataset demonstrate substantial enhancements over existing models, thus charting a new path for intricate multi-object 3D scene comprehension.

Installation

Requirements

  • Python 3.7 or higher
  • Pytorch 1.12
  • CUDA 11.3 or higher

The following installation suppose python=3.8 pytorch=1.12.1 and cuda=11.3.

  • Create a conda virtual environment

    conda create -n 3d-gres python=3.8
    conda activate 3d-gres
    
  • Clone the repository

    git clone https://github.com/sosppxo/MDIN.git
    
  • Install the dependencies

    Install Pytorch 1.12.1

    pip install spconv-cu113
    conda install pytorch-scatter -c pyg
    pip install -r requirements.txt
    

    Install segmentator from this repo (We wrap the segmentator in ScanNet).

  • Setup, Install mdin and pointgroup_ops.

    sudo apt-get install libsparsehash-dev
    python setup.py develop
    cd gres_model/lib/
    python setup.py develop
    

Data Preparation

ScanNet v2 dataset

Download the ScanNet v2 dataset.

Put the downloaded scans folder as follows.

MDIN
├── data
│   ├── scannetv2
│   │   ├── scans

Split and preprocess point cloud data

cd data/scannetv2
bash prepare_data.sh

The script data into train/val folder and preprocess the data. After running the script the scannet dataset structure should look like below.

MDIN
├── data
│   ├── scannetv2
│   │   ├── scans
│   │   ├── train
│   │   ├── val

ScanRefer dataset

Download ScanRefer annotations following the instructions.

In the original ScanRefer annotations, all ann_id within each scene were individually assigned based on the corresponding object_id, resulting in duplicate ann_id. We have modified the ScanRefer annotations, and the revised annotation data, where each ann_id within a scene is unique, can be accessed here.

Put the downloaded ScanRefer folder as follows.

MDIN
├── data
│   ├── ScanRefer
│   │   ├── ScanRefer_filtered_train_new.json
│   │   ├── ScanRefer_filtered_val_new.json

Multi3DRefer dataset

Downloading the Multi3DRefer annotations.

Put the downloaded Multi3DRefer folder as follows.

MDIN
├── data
│   ├── Multi3DRefer
│   │   ├── multi3drefer_train.json
│   │   ├── multi3drefer_val.json

Pretrained Backbone

Download SPFormer pretrained model (We only use the Sparse 3D U-Net backbone for training).

Move the pretrained model to backbones.

mkdir backbones
mv ${Download_PATH}/sp_unet_backbone.pth backbones/

Models

Download pretrain models and move it to checkpoints.

Benchmark Task mIoU [email protected] [email protected] Model
Multi3DRes 3D-GRES 47.5 66.9 44.7 Model
ScanRefer 3D-RES 48.3 58.0 53.1 Model
Nr3D 3D-RES 38.6 48.4 42.2 Model
Sr3D 3D-RES 46.4 56.6 51.3 Model

Training

For 3D-GRES:

bash scripts/train_3dgres.sh

For 3D-RES:

bash scripts/train_3dres.sh

Inference

For 3D-GRES:

bash scripts/test_3dgres.sh

For 3D-RES:

bash scripts/test_3dres.sh

Citation

If you find this work useful in your research, please cite:

@misc{wu20243dgresgeneralized3dreferring,
      title={3D-GRES: Generalized 3D Referring Expression Segmentation}, 
      author={Changli Wu and Yihang Liu and Jiayi Ji and Yiwei Ma and Haowei Wang and Gen Luo and Henghui Ding and Xiaoshuai Sun and Rongrong Ji},
      year={2024},
      eprint={2407.20664},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2407.20664}, 
}

Ancknowledgement

Sincerely thanks for ReLA, M3DRef-CLIP, EDA, SceneGraphParser, SoftGroup, SSTNet and SPFormer repos. This repo is build upon them.

About

[MM2024 Oral] 3D-GRES: Generalized 3D Referring Expression Segmentation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published