Skip to content
View zhangyunming's full-sized avatar

Block or report zhangyunming

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Python 4,892 399 Updated Oct 2, 2024

Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Python 2,463 138 Updated Oct 4, 2024

High-resolution models for human tasks.

Python 4,133 218 Updated Oct 3, 2024

text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)

Python 7,862 732 Updated Oct 5, 2024

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…

Jupyter Notebook 11,351 972 Updated Oct 5, 2024

Kolors Team

Python 3,666 242 Updated Sep 4, 2024

Code and data of We-Math

Python 122 8 Updated Sep 29, 2024

​ 李白 👤 作为唐代杰出诗人,其诗歌作品在中国文学史上具有重要地位。近年来,随着数字技术和人工智能的快速发展,传统文化普及推广的形式也面临着创新与变革。国内外对于李白诗歌的研究虽已相当深入,但在数字化、智能化普及方面仍存在不足。因此,本项目旨在通过构建李白知识图谱,结合大模型训练出专业的AI智能体,以生成式对话应用的形式,推动李白文化的普及与推广。

Python 1,169 140 Updated Sep 1, 2024

Controllable video and image Generation, SVD, Animate Anyone, ControlNet, ControlNeXt, LoRA

Python 1,300 60 Updated Sep 25, 2024

The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.

Python 52,439 5,537 Updated Oct 6, 2024

[CVPR 2024] This is the official source for our paper "SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis"

Python 1,260 150 Updated Aug 28, 2024

Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key

Python 5,402 555 Updated Jul 3, 2024

Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation

Python 9,262 1,275 Updated Sep 14, 2024

V-Express aims to generate a talking head video under the control of a reference image, an audio, and a sequence of V-Kps images.

Python 2,202 277 Updated Jun 29, 2024

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting yo…

TypeScript 47,176 6,703 Updated Oct 3, 2024

A generative speech model for daily dialogue.

Python 31,227 3,386 Updated Sep 21, 2024

GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型

Python 4,767 393 Updated Oct 6, 2024

Official PyTorch implementation of ECCV 2024 Paper: ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback.

Python 385 16 Updated Sep 30, 2024

YOLOv10: Real-Time End-to-End Object Detection [NeurIPS 2024]

Python 9,602 917 Updated Sep 26, 2024

[CVPR2024] StableVITON: Learning Semantic Correspondence with Latent Diffusion Model for Virtual Try-On

Python 974 149 Updated Jul 18, 2024

GPT4V-level open-source multi-modal model based on Llama3-8B

Python 2,027 135 Updated Sep 3, 2024

[CVPR 2024] Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution

Python 948 44 Updated Sep 27, 2024

Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

Python 3,328 284 Updated Aug 15, 2024

More relighting!

Python 4,958 335 Updated Jun 27, 2024
Python 2,542 191 Updated Oct 4, 2024

Mixture-of-Experts for Large Vision-Language Models

Python 1,936 123 Updated May 15, 2024

【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

Python 2,887 208 Updated Sep 25, 2024

AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation

Python 4,547 571 Updated Jul 2, 2024

Official implementation of FaceXFormer: A Unified Transformer for Facial Analysis

Python 187 19 Updated Apr 4, 2024

CAMixerSR: Only Details Need More “Attention” (CVPR 2024)

Python 212 11 Updated Jun 4, 2024
Next