Highlights
Lists (1)
Sort Name ascending (A-Z)
Stars
Official Implementation of "Open-Vocabulary Audio-Visual Semantic Segmentation" [ACM MM 2024 Oral].
[ACM MM-2024] RefMask3D: Language-Guided Transformer for 3D Referring Segmentation
A curated list of audio-visual learning methods and datasets.
[ECCV 2024] PanoVOS: Bridging Non-panoramic and Panoramic Views with Transformer for Video Segmentation
The official repo for "Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes", ECCV 2024
[ECCV24] VISA: Reasoning Video Object Segmentation via Large Language Model
This repository is for the first comprehensive survey on Meta AI's Segment Anything Model (SAM).
Mathematical Visual Instruction Tuning for Multi-modal Large Language Models
Understand Human Behavior to Align True Needs
Official Pytorch Implementation for “DINO-Tracker: Taming DINO for Self-Supervised Point Tracking in a Single Video”
A curated list for vision-and-language navigation. ACL 2022 paper "Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions"
[CVPR 2024] The code for paper 'Towards Learning a Generalist Model for Embodied Navigation'
[CVPR 2024 & NeurIPS 2024] EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI
llama3 implementation one matrix multiplication at a time
Accessible large language models via k-bit quantization for PyTorch.
A 4-hour coding workshop to understand how LLMs are implemented and used
An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.
Efficiently Fine-Tune 100+ LLMs in WebUI (ACL 2024)
[ECCV 2024] VISAGE: Video Instance Segmentation with Appearance-Guided Enhancement
Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.
[CVPR 2024] iKUN: Speak to Trackers without Retraining
Multi-Granularity Language-Guided Multi-Object Tracking