-
Tsinghua University
- Shenzhen, China
Highlights
- Pro
Lists (1)
Sort Name ascending (A-Z)
Stars
Code and model for AAAI 2024: UMIE: Unified Multimodal Information Extraction with Instruction Tuning
Writing AI Conference Papers: A Handbook for Beginners
Visual Instruction Tuning for Qwen2 Base Model
[ECCV 2024] Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMs
shannany0606 / OPERA
Forked from shikiw/OPERA[CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation
Papers about Hallucination in Multi-Modal Large Language Models (MLLMs)
A Universal and Fine-grained Hallucination Evaluation Framework for Large Vision Language Models
[CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
TALL: Temporal Activity Localization via Language Query
[ACL 2024] GroundingGPT: Language-Enhanced Multi-modal Grounding Model
[NeurIPS 2022 Spotlight] Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations
Use PEFT or Full-parameter to finetune 350+ LLMs or 90+ MLLMs. (Qwen2.5, GLM4v, Internlm2.5, Yi, Llama3.1, Llava-Video, Internvl2, MiniCPM-V-2.6, Deepseek, Baichuan2, Gemma2, Phi3-Vision, ...)
[EMNLP'23] The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''
GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型
[CVPR'2024 Highlight] Official PyTorch implementation of the paper "VTimeLLM: Empower LLM to Grasp Video Moments".
Reading notes about Multimodal Large Language Models, Large Language Models, and Diffusion Models
[CVPR 2024 Highlight] Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding
DeepSeek-VL: Towards Real-World Vision-Language Understanding
Open-source evaluation toolkit of large vision-language models (LVLMs), support ~100 VLMs, 40+ benchmarks
2024年第十五届蓝桥杯全国总决赛 Python 程序设计大学A组全国一等奖纪念(题目+考场代码)
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
整理开源的中文大语言模型,以规模较小、可私有化部署、训练成本较低的模型为主,包括底座模型,垂直领域微调及应用,数据集与教程等。