Recent Advances in Video Object Segmentation (VOS). VOS works before 2022 can be found in our review paper:
Deep Learning for Video Object Segmentation: A Review / paper / project page
🧸 We indicate different VOS types with coloured squares:
🟦 SVOS
: Semi-Supervised VOS (also termed as One-Shot VOS)
🟩 UVOS
: Un-Supervised VOS (also termed as Zero-Shot VOS)
🟧 RVOS
: Referring VOS (also termed as Language-Guided VOS)
🟥 AVOS
: Audio-guided VOS (also termed as Audio-Visual Video Segmentation)
⬜ XVOS
: Other types of VOS
🧸 Please feel free to send us pull requests to add VOS works.
Links for a quick jump: ArXiv 2023, ACMMM 2023, ICCV 2023, CVPR 2023, IJCAI 2023, AAAI 2023, Journals 2023, Earlier ArXiv 2023, NeurIPS 2022, ECCV 2022, CVPR 2022, AAAI 2022, Journals 2022
🟧 RVOS
Nov
- paper / code - VIDiff: Translating Videos via Multi-Modal Instructions with Diffusion Models (:fire: versatile model, support rvos)
🟩 UVOS
Nov
- paper / code - Betrayed by Attention: A Simple yet Effective Approach for Self-supervised Video Object Segmentation
⬜ XVOS
Nov
- paper / code - Sketch-based Video Object Segmentation: Benchmark and Analysis
⬜ XVOS
Nov
- paper / code - Learning the What and How of Annotation in Video Object Segmentation
🟦 SVOS
Oct
- paper / code - Putting the Object Back into Video Object Segmentation
🟦 SVOS
Oct
- paper / code - Sub-token ViT Embedding via Stochastic Resonance Transformers (support svos)
🟦 SVOS
Sep
- paper / DATASET - PanoVOS: Bridging Non-panoramic and Panoramic Views with Transformer for Video Segmentation
🟩 UVOS
Sep
- paper / code - Treating Motion as Option with Output Selection for Unsupervised Video Object Segmentation
🟥 AVOS
Sep
- paper / code - Rethinking Audiovisual Segmentation with Semantic Quantization and Decomposition
🟦 SVOS
Aug
- paper / code - Joint Modeling of Feature, Correspondence, and a Compressed Memory for Video Object Segmentation
🟧 RVOS
🟥 AVOS
Aug
- paper / code - EPCFormer: Expression Prompt Collaboration Transformer for Universal Referring Video Object Segmentation
🟧 RVOS
Aug
- paper / code - Learning Referring Video Object Segmentation from Weak Annotation
🟦 SVOS
Jul
- paper / code - Tracking Anything in High Quality
🟧 RVOS
Jul
- paper / code - Referring Video Object Segmentation with Inter-Frame Interaction and Cross-Modal Correlation
🟧 RVOS
Jul
- paper / code - RefSAM: Efficiently Adapting Segmenting Anything Model for Referring Video Object Segmentation
⬜ XVOS
Jul
- paper / code - Segment Anything Meets Point Tracking
🟧 RVOS
Jun
- paper / code - LoSh: Long-Short Text Joint Prediction Network for Referring Video Object Segmentation
🟩 UVOS
- paper / code - SimulFlow: Simultaneously Extracting Feature and Identifying Target for Unsupervised Video Object Segmentation
🟩 UVOS
- paper / code - Temporally Efficient Gabor Transformer for Unsupervised Video Object Segmentation
🟦 SVOS
- paper / code - Exploring the Adversarial Robustness of Video Object Segmentation via One-shot Adversarial Attacks
🟥 AVOS
- paper / code - CATR: Combinatorial-Dependence Audio-Queried Transformer for Audio-Visual Video Segmentation
🟥 AVOS
- paper / code - Audio-Visual Segmentation by Exploring Cross-Modal Mutual Semantics
⬜ XVOS
- paper / code - Multi-grained Temporal Prototype Learning for Few-shot Video Object Segmentation
🟩 UVOS
- paper / code - Time Does Tell: Self-Supervised Time-Tuning of Dense Image Representations (self-supervised learning for UVOS
)
🟩 UVOS
- paper / code - Isomer: Isomerous Transformer for Zero-Shot Video Object Segmentation
🟩 UVOS
- paper / code - Unsupervised Video Object Segmentation with Online Adversarial Self-Tuning
🟩 UVOS
🟧 RVOS
- paper / code - DEVA: Tracking Anything with Decoupled Video Segmentation (:fire:versatile model
)
🟧 RVOS
- paper / code - Temporal Collection and Distribution for Referring Video Object Segmentation
🟧 RVOS
- paper / code - Robust Referring Video Object Segmentation with Cyclic Structural Consensus
🟧 RVOS
- paper / code - Spectrum-guided Multi-granularity Referring Video Object Segmentation
🟧 RVOS
- paper / code - OnlineRefer: A Simple Online Baseline for Referring Video Object Segmentation
🟧 RVOS
- paper / code - Learning Cross-Modal Affinity for Referring Video Object Segmentation Targeting Limited Samples
🟧 RVOS
- paper / code - HTML: Hybrid Temporal-scale Multimodal Learning Framework for Referring Video Object Segmentation
🟧 RVOS
- paper / DATASET - MeViS: A Large-scale Benchmark for Video Segmentation with Motion Expressions
🟦 SVOS
- paper / code - Integrating Boxes and Masks: A Multi-Object Framework for Unified Visual Tracking and Segmentation (:fire:versatile model
)
🟦 SVOS
- paper / code - XMem++: Production-level Video Segmentation From Few Annotated Frames
🟦 SVOS
- paper / code - Scalable Video Object Segmentation with Simplified Framework
🟦 SVOS
- paper / code - Alignment Before Aggregation: Trajectory Memory Retrieval Network for Video Object Segmentation
🟦 SVOS
- paper / code - SegGPT: Segmenting Everything In Context (:fire:versatile model
)
🟦 SVOS
- paper / DATASET - LVOS: A Benchmark for Long-term Video Object Segmentation
🟦 SVOS
- paper / DATASET - MOSE: A New Dataset for Video Object Segmentation in Complex Scenes
🟩 UVOS
- paper / code - MED-VT: Multiscale Encoder-Decoder Video Transformer with Application to Object Segmentation
🟦 SVOS
- paper / code - Boosting Video Object Segmentation via Space-time Correspondence Learning
🟦 SVOS
🟧 RVOS
- paper / code - Universal Instance Perception as Object Discovery and Retrieval (:fire: versatile model)
🟦 SVOS
- paper / code - TarViS: A Unified Approach for Target-Based Video Segmentation (:fire:versatile model
)
🟦 SVOS
- paper / code - Two-shot Video Object Segmetnation
🟦 SVOS
- paper / code - MobileVOS: Real-Time Video Object Segmentation Contrastive Learning meets Knowledge Distillation
🟦 SVOS
- paper / code - Look Before You Match: Instance Understanding Matters in Video Object Segmentation
⬜ XVOS
- paper / DATASET - Breaking the “Object” in Video Object Segmentation
🟥 AVOS
- paper / code - Discovering Sounding Objects by Audio Queries for Audio Visual Segmentation
🟦 SVOS
- paper / DATASET - Video Object Segmentation in Panoptic Wild Scenes
🟦 SVOS
- paper / code - Learning to Learn Better for Video Object Segmentation
🟩 UVOS
- paper / code - TIP
Hierarchical Graph Pattern Understanding for Zero-Shot Video Object Segmentation
🟩 UVOS
- paper / code - TCSVT
Online Unsupervised Video Object Segmentation via Contrastive Motion Clustering
🟦 SVOS
- paper / code - TIP
Hierarchical Co-Attention Propagation Network for Zero-Shot Video Object Segmentation
🟧 RVOS
- paper / code - TPAMI
VLT: Vision-Language Transformer and Query Generation for Referring Segmentation
🟧 RVOS
- paper / code - TPAMI
Local-Global Context Aware Transformer for Language-Guided Video Segmentation
🟩 UVOS
- paper / code - UVOSAM: A Mask-free Paradigm for Unsupervised Video Object Segmentation via Segment Anything Model
🟧 RVOS
🟥 AVOS
- paper / code - Referred by Multi-Modality: A Unified Temporal
Transformer for Video Object Segmentation
🟧 RVOS
- paper / code - SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation
⬜ XVOS
- paper / code - Segment and Track Anything
⬜ XVOS
- paper / code - Track Anything: Segment Anything Meets Videos
⬜ XVOS
- paper / code - Reliability-Hierarchical Memory Network for Scribble-Supervised Video Object Segmentation
🟦 SVOS
- paper / code - Decoupling Features in Hierarchical Propagation for Video Object Segmentation
⬜ XVOS
- paper / code - Self-supervised Amodal Video Object Segmentation
🟦 SVOS
- paper / code - XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model
🟦 SVOS
- paper / code - BATMAN: Bilateral Attention Transformer in Motion-Appearance Neighboring Space for Video Object Segmentation
🟦 SVOS
- paper / code - Learning Quality-aware Dynamic Memory for Video Object Segmentation
🟦 SVOS
- paper / code - Tackling Background Distraction in Video Object Segmentation
🟦 SVOS
- paper / code - Global Spectral Filter Memory Network for Video Object Segmentation
🟩 UVOS
- paper / code - Hierarchical Feature Alignment Network for Unsupervised Video Object Segmentation
🟧 RVOS
- paper / code - End-to-End Referring Video Object Segmentation With Multimodal Transformers
🟧 RVOS
- paper / code - Language As Queries for Referring Video Object Segmentation
🟧 RVOS
- paper / code - Language-Bridged Spatial-Temporal Interaction for Referring Video Object Segmentation
🟧 RVOS
- paper / code - Multi-Level Representation Learning With Semantic Alignment for Referring Video Object Segmentation
🟦 SVOS
- paper / code - Recurrent Dynamic Embedding for Video Object Segmentation
🟦 SVOS
- paper / code - Accelerating Video Object Segmentation With Compressed Video
🟦 SVOS
- paper / code - SWEM: Towards Real-Time Video Object Segmentation With Sequential Weighted Expectation-Maximization
🟦 SVOS
- paper / code - Per-Clip Video Object Segmentation
⬜ XVOS
- paper / code - Wnet: Audio-Guided Video Object Segmentation via Wavelet-Based Cross-Modal Denoising Networks
⬜ XVOS
- paper / DATASET - YouMVOS: An Actor-Centric Multi-Shot Video Object Segmentation Dataset
🟦 SVOS
- paper / code - Siamese Network with Interactive Transformer for Video Object Segmentation
🟦 SVOS
- paper / code - Reliable Propagation-Correction Modulation for Video Object Segmentation
🟧 RVOS
- paper / code - You Only Infer Once: Cross-Modal Meta-Transfer for Referring Video Object Segmentation
🟩 UVOS
- paper / code - Iteratively Selecting an Easy Reference Frame Makes Unsupervised Video Object Segmentation Easier
🟦 SVOS
- paper / code - TPAMI
Video Object Segmentation Using Kernelized Memory Network With Multiple Kernels
🟦 SVOS
- paper / code - TIP
From Pixels to Semantics: Self-Supervised Video Object Segmentation With Multiperspective Feature Mining
🟦 SVOS
- paper / code - TIP
Delving Deeper Into Mask Utilization in Video Object Segmentation
🟦 SVOS
- paper / code - TIP
Adaptive Online Mutual Learning Bi-Decoders for Video Object Segmentation
End of the list. 🌱
VOS papers and datasets before 2022 could be found below:
Deep Learning for Video Object Segmentation: A Review / paper / project page