-
Paris-Saclay University | ENSTA Paris
- China
-
10:41
(UTC +08:00) - xuanlong-yu.github.io
Stars
[NeurIPS 2021] You Only Look at One Sequence
[ICCV 2023 Oral] IOMatch: Simplifying Open-Set Semi-Supervised Learning with Joint Inliers and Outliers Utilization
[ECCV 2024] Be-Your-Outpainter https://arxiv.org/abs/2403.13745
A curated list of papers and resources related to Described Object Detection, Open-Vocabulary/Open-World Object Detection and Referring Expression Comprehension. Updated frequently and pull request…
[ICCV'23 Main Track, WECIA'23 Oral] Official repository of paper titled "Self-regulating Prompts: Foundational Model Adaptation without Forgetting".
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
SuperPrompt is an attempt to engineer prompts that might help us understand AI agents.
CVPR 2023: Language in a Bottle: Language Model Guided Concept Bottlenecks for Interpretable Image Classification
PyTorch Implementation of ECCV 2024 OOD-CV Workshop SSB Challenge (Open-Set Recognition Track) - 1st Place
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
(Pattern Recognition) Pytorch implementation of “HTR-VT: Handwritten Text Recognition with Vision Transformer”
Examples and tutorials on using SOTA computer vision models and techniques. Learn everything from old-school ResNet, through YOLO and object-detection transformers like DETR, to the latest models l…
Images to inference with no labeling (use foundation models to train supervised models).
GPT4V-level open-source multi-modal model based on Llama3-8B
a state-of-the-art-level open visual language model | 多模态预训练模型
An open source implementation of CLIP.
Real-time and accurate open-vocabulary end-to-end object detection
[CVPR 2023] Bridging Precision and Confidence: A Train-Time Loss for Calibrating Object Detection
Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI
A curated list of papers, datasets and resources pertaining to open vocabulary object detection.
Code release for "Active Teacher for Semi-Supervised Object Detection", CVPR2022
A curated list of papers & resources linked to open set recognition, out-of-distribution, open set domain adaptation and open world recognition
(TPAMI 2024) A Survey on Open Vocabulary Learning
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.