Skip to content
View songtang01's full-sized avatar
🏖️
work
🏖️
work
  • Fudan University
  • 上海
  • 10:18 (UTC +08:00)

Block or report songtang01

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

本仓库收集AI科技领域高质量信息源。 可以起到一个同步信息源的作用,避免信息差和信息茧房。

TypeScript 1 Updated Mar 9, 2024

Effortless data labeling with AI support from Segment Anything and other awesome models.

Python 3,785 437 Updated Sep 26, 2024

复旦大学体育场馆自动预约 FDU Sports Auto Reserve

Python 63 45 Updated Oct 18, 2023
Python 17 1 Updated Sep 3, 2024

In this project, I walk through a user-friendly tool that I created to run SOTA video segmentation and auto-label data for object detection and tracking tasks.

HTML 7 Updated Sep 4, 2024

EAGLE: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders

Python 503 43 Updated Sep 19, 2024

ICCV'2023 | CTVIS: Consistent Training for Online Video Instance Segmentation

Python 70 4 Updated Oct 15, 2023

[ACM MM 2022] Modality-aware Contrastive Instance Learning with Self-Distillation for Weakly-Supervised Audio-Visual Violence Detection

Python 30 6 Updated Jul 13, 2022

本项目旨在分享大模型相关技术原理以及实战经验。

HTML 9,357 916 Updated Sep 22, 2024

Flickr30K Entities Dataset

MATLAB 162 26 Updated Dec 23, 2018

[MM2024 Oral] 3D-GRES: Generalized 3D Referring Expression Segmentation

Python 18 Updated Aug 20, 2024

LLM&VLM Tutorial

Python 1,336 947 Updated Sep 28, 2024

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

Python 5,624 438 Updated Sep 19, 2024

Modified LLaVA framework for MOSS2, and makes MOSS2 a multimodal model.

Python 12 2 Updated Sep 19, 2024

Collection of resources on the applications of Large Language Models (LLMs) in Audio AI.

555 31 Updated Aug 3, 2024

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…

Jupyter Notebook 11,079 940 Updated Aug 21, 2024

[ACM MM-2024] RefMask3D: Language-Guided Transformer for 3D Referring Segmentation

Python 45 1 Updated Jul 29, 2024

📚 A collection of papers about Referring Image Segmentation.

602 56 Updated Aug 30, 2024

[T-PAMI-2024] Transformer-Based Visual Segmentation: A Survey

667 47 Updated Aug 25, 2024

[CVPR-2024] Decoupling Static and Hierarchical Motion Perception for Referring Video Segmentation

Python 75 Updated Jul 24, 2024

A curated list of audio-visual learning methods and datasets.

221 17 Updated Sep 11, 2024

[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization

Python 543 57 Updated Jun 7, 2024

[ECCV2024] This is an official implementation for "PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model"

Python 180 8 Updated Sep 3, 2024
25 Updated Jul 19, 2024

The official repo for "Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes", ECCV 2024

Python 21 1 Updated Jul 27, 2024

This repository is for the first comprehensive survey on Meta AI's Segment Anything Model (SAM).

783 50 Updated Sep 27, 2024

Modern Computer Vision with PyTorch, published by Packt

Jupyter Notebook 702 317 Updated Jun 8, 2024

A Survey of Image Editing

219 9 Updated Jul 22, 2024

ICML'2024 | MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI

Python 88 2 Updated Jul 18, 2024
Next