Skip to content

Latest commit

 

History

History
 
 

cape

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 

CAPE: Camera View Position Embedding for Multi-View 3D Object Detection

arXiv

目录

摘要

In this paper, we address the problem of detecting 3D objects from multi-view images. Current query-based methods rely on global 3D position embeddings (PE) to learn the geometric correspondence between images and 3D space. We claim that directly interacting 2D image features with global 3D PE could increase the difficulty of learning view transformation due to the variation of camera extrinsics. Thus we propose a novel method based on CAmera view Position Embedding, called CAPE. We form the 3D position embeddings under the local camera-view coordinate system instead of the global coordinate system, such that 3D position embedding is free of encoding camera extrinsic parameters. Furthermore, we extend our CAPE to temporal modeling by exploiting the object queries of previous frames and encoding the ego motion for boosting 3D object detection. CAPE achieves the state-of-the-art performance (61.0% NDS and 52.5% mAP) among all LiDAR-free methods on nuScenes dataset.

简介

CAPE提出了一种相机视角嵌入信息(CAmera view Position Embedding)的方法,通过视角归一化的方式,来降低直接使用3D全局位置嵌入信息来学习图像和3D空间之间的对应关系的难度. 该方法在nuScenes数据集的纯视觉配置上取得了SOTA的表现, 并中稿CVPR2023

视角归一化示意图:

算法流程图如下所示:

训练配置

目前,我们提供了在开源数据集nuScenes验证集上的三种训练配置与结果,详见CAPE训练配置

模型库

模型 骨干网络 分辨率 NDS 3DmAP 模型下载 配置文件 日志
CAPE r50 1408x512 40.58 34.72 model config -
CAPE-T r50 704x256 44.22 31.78 model config -
CAPE-T v99 800x320 54.36 44.72 model config -

可视化

使用教程

数据准备

请下载Nuscenes测数据集, 下载作者提供的annotion文件。

下载好后的数据集目录结构

nuscenes
   ├── maps
   ├── samples
   ├── sweeps
   ├── v1.0-trainval
   ├── v1.0-test
   ...

将nuscenes数据软链至data/nuscenes,或更改配置文件数据集路径。 运行如下命令生成petr模型所需的annotation文件。

python tools/create_petr_nus_infos.py

生成完后的数据集目录

nuscenes
   ├── maps
   ├── samples
   ├── sweeps
   ├── v1.0-trainval
   ├── v1.0-test
   ├── petr_nuscenes_annotation_train.pkl
   ├── petr_nuscenes_annotation_val.pkl

为了方便,我们提供了生成好的annotation文件

文件名称 下载链接
petr_nuscenes_annotation_train.pkl 下载
petr_nuscenes_annotation_val.pkl 下载

训练

todo

评估

运行以下命令,进行评估

python tools/evaluate.py --config configs/cape/capet_vovnet_800x320_24ep_wocbgs_load_dd3d_pretrain.yml --model /path/to/your/capet_vov99_800x320_epoch_24.pdparams

引用

如果您认为该工作对您的研究有帮助,请考虑引用:

@article{Xiong2023CAPE,
  title={CAPE: Camera View Position Embedding for Multi-View 3D Object Detection},
  author={Kaixin Xiong, Shi Gong, Xiaoqing Ye, Xiao Tan, Ji Wan, Errui Ding, Jingdong Wang, Xiang Bai},
  booktitle={Computer Vision and Pattern Recognition},
  year={2023}
}