Stars
[ECCV 2024] Official Repository for DiffiT: Diffusion Vision Transformers for Image Generation
Includes the code for training and testing the CountGD model from the paper CountGD: Multi-Modal Open-World Counting.
📈 目前最大的工业缺陷检测数据库及论文集 Constantly summarizing open source dataset and critical papers in the field of surface defect research which are of great importance.
无监督正样本训练 检测缺陷并分割图像
Segment Anything in Defect Detection
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
Dead simple FLUX LoRA training UI with LOW VRAM support
[ICLR 2023] Official implementation of the paper "DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection"
[CVPR2024 Highlight]GLEE: General Object Foundation Model for Images and Videos at Scale
[CVPR 2024] Official implementation of the paper "Visual In-context Learning"
API for Grounding DINO 1.5: IDEA Research's Most Capable Open-World Object Detection Model Series
Tutorials for creating and using ONNX models
📻Terminal/ssh/telnet/serialport/RDP/VNC/sftp client(linux, mac, win)
Labeling tool with SAM(segment anything model),supports SAM, SAM2, sam-hq, MobileSAM EdgeSAM etc.交互式半自动图像标注工具
OLMoE: Open Mixture-of-Experts Language Models
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
⚡️HivisionIDPhotos: a lightweight and efficient AI ID photos tools. 一个轻量级的AI证件照制作算法。
High-resolution models for human tasks.
【CVPR 2024 Highlight】Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models
A Powerful web scraper powered by LLM | OpenAI, Gemini & Ollama
Nanolog is an extremely performant nanosecond scale logging system for C++ that exposes a simple printf-like API.
VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
mimalloc is a compact general purpose allocator with excellent performance.
Open-source vector similarity search for Postgres
MixTeX multimodal LaTeX, ZhEn, and, Table OCR. It performs efficient CPU-based inference in a local offline on Windows.
🔊 Text-Prompted Generative Audio Model
Official Implementation of "Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining"
[ECCV 2022] This is the official implementation of BEVFormer, a camera-only framework for autonomous driving perception, e.g., 3D object detection and semantic map segmentation.