Skip to content

official repository of CVPR 2024 paper, RMem: Restricted Memory Banks Improve Video Object Segmentation

License

Notifications You must be signed in to change notification settings

Restricted-Memory/RMem

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RMem: Restricted Memory Banks Improve Video Object Segmentation

Junbao Zhou, Ziqi Pang, Yu-Xiong Wang

University of Illinois Urbana-Champaign

Abstract

With recent video object segmentation (VOS) benchmarks evolving to challenging scenarios, we revisit a simple but overlooked strategy: restricting the size of memory banks. This diverges from the prevalent practice of expanding memory banks to accommodate extensive historical information. Our specially designed "memory deciphering" study offers a pivotal insight underpinning such a strategy: expanding memory banks, while seemingly beneficial, actually increases the difficulty for VOS modules to decode relevant features due to the confusion from redundant information. By restricting memory banks to a limited number of essential frames, we achieve a notable improvement in VOS accuracy. This process balances the importance and freshness of frames to maintain an informative memory bank within a bounded capacity. Additionally, restricted memory banks reduce the training-inference discrepancy in memory lengths compared with continuous expansion. This fosters new opportunities in temporal reasoning and enables us to introduce the previously overlooked "temporal positional embedding." Finally, our insights are embodied in "RMem" ("R" for restricted), a simple yet effective VOS modification that excels at challenging VOS scenarios and establishes new state of the art for object state changes (VOST dataset) and long videos (the Long Videos dataset).

Method Overview

method

  • (a) RMem revisits restricting memory banks to enhance VOS, motivated by the insight from our pilot study.
  • (b) To maintain an informative memory bank, we balance both the relevance and freshness of frames when updating the latest features.
  • (c) Benefiting from smaller memory size gaps between training and inference, we introduce previously overlooked temporal positional embedding to encode the orders of frames explicitly, which enhances spatio-temporal reasoning.

Data preparation

Download the VOST dataset from vostdataset.org , and organize the directory structure as follows:

├── aot_plus
│   ├── configs
│   ├── dataloaders
│   ├── datasets
│   │   └── VOST
│   │       ├── Annotations
│   │       ├── ImageSets
│   │       ├── JPEGImages
│   │       └── JPEGImages_10fps
│   ├── docker
│   ├── networks
│   ├── pretrain_models
│   └── tools
├── evaluation
└── README.md

hint: you can achieve it by soft link:

ln -s <your VOST directory>  ./datasets/VOST

Checkpoint

Method $\mathcal{J}_{tr}$ $\mathcal{J}$
R50 AOTL 37.0 49.2 download link
R50 DeAOTL 37.6 51.0 download link
R50 AOTL + RMem 39.8 50.5 download link
R50 DeAOTL + RMem 40.4 51.8 download link

Download the checkpoint and put them in ./aot_plus/pretrain_models/

Evaluation

Firstly prepare the pytorch environment. Please follow the instructions on pytorch.org and choose the pytorch version that is most compatible with your machine.

Then

conda install numpy matplotlib scipy scikit-learn tqdm pyyaml pandas
pip install opencv-python

Now you can replicate the result of our checkpoint.

cd ./aot_plus
./eval_vost.sh

If you want to evaluate AOT, please modify the eval_vost.sh, change model to r50_aotl and change ckpt_path to aotplus_R50_AOTL_Temp_pe_Slot_4_ema_20000.pth.

Training

If you want to train your own model, you can train it from the AOT/DeAOT model (pretrained on DAVIS and YouTubeVOS) provided by the official AOT team. The models can be accessed from the MODEL_ZOO

Method
R50 AOTL download link
R50 DeAOTL download link

Then

cd ./aot_plus
./train_vost.sh

About

official repository of CVPR 2024 paper, RMem: Restricted Memory Banks Improve Video Object Segmentation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published