-
Fudan University
- 上海
-
10:18
(UTC +08:00)
Stars
本仓库收集AI科技领域高质量信息源。 可以起到一个同步信息源的作用,避免信息差和信息茧房。
Effortless data labeling with AI support from Segment Anything and other awesome models.
复旦大学体育场馆自动预约 FDU Sports Auto Reserve
In this project, I walk through a user-friendly tool that I created to run SOTA video segmentation and auto-label data for object detection and tracking tasks.
EAGLE: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
ICCV'2023 | CTVIS: Consistent Training for Online Video Instance Segmentation
[ACM MM 2022] Modality-aware Contrastive Instance Learning with Self-Distillation for Weakly-Supervised Audio-Visual Violence Detection
[MM2024 Oral] 3D-GRES: Generalized 3D Referring Expression Segmentation
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
Modified LLaVA framework for MOSS2, and makes MOSS2 a multimodal model.
Collection of resources on the applications of Large Language Models (LLMs) in Audio AI.
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
[ACM MM-2024] RefMask3D: Language-Guided Transformer for 3D Referring Segmentation
📚 A collection of papers about Referring Image Segmentation.
[T-PAMI-2024] Transformer-Based Visual Segmentation: A Survey
[CVPR-2024] Decoupling Static and Hierarchical Motion Perception for Referring Video Segmentation
A curated list of audio-visual learning methods and datasets.
[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization
[ECCV2024] This is an official implementation for "PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model"
The official repo for "Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes", ECCV 2024
This repository is for the first comprehensive survey on Meta AI's Segment Anything Model (SAM).
Modern Computer Vision with PyTorch, published by Packt
ICML'2024 | MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI