zhangyunming

huhu zhangyunming

13 followers · 8 following

Shenzhen

Stars

Ucas-HaoranWei / GOT-OCR2.0

Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Python 4,892 399 Updated Oct 2, 2024

QwenLM / Qwen2-VL

Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Python 2,463 138 Updated Oct 4, 2024

facebookresearch / sapiens

High-resolution models for human tasks.

Python 4,133 218 Updated Oct 3, 2024

THUDM / CogVideo

text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)

Python 7,862 732 Updated Oct 5, 2024

facebookresearch / sam2

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…

Jupyter Notebook 11,351 972 Updated Oct 5, 2024

Kwai-Kolors / Kolors

Kolors Team

Python 3,666 242 Updated Sep 4, 2024

We-Math / We-Math

Code and data of We-Math

Python 122 8 Updated Sep 29, 2024

BinNong / meet-libai

李白 👤 作为唐代杰出诗人，其诗歌作品在中国文学史上具有重要地位。近年来，随着数字技术和人工智能的快速发展，传统文化普及推广的形式也面临着创新与变革。国内外对于李白诗歌的研究虽已相当深入，但在数字化、智能化普及方面仍存在不足。因此，本项目旨在通过构建李白知识图谱，结合大模型训练出专业的AI智能体，以生成式对话应用的形式，推动李白文化的普及与推广。

Python 1,169 140 Updated Sep 1, 2024

dvlab-research / ControlNeXt

Controllable video and image Generation, SVD, Animate Anyone, ControlNet, ControlNeXt, LoRA

Python 1,300 60 Updated Sep 25, 2024

comfyanonymous / ComfyUI

The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.

Python 52,439 5,537 Updated Oct 6, 2024

ZiqiaoPeng / SyncTalk

[CVPR 2024] This is the official source for our paper "SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis"

Python 1,260 150 Updated Aug 28, 2024

rany2 / edge-tts

Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key

Python 5,402 555 Updated Jul 3, 2024

fudan-generative-vision / hallo

Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation

Python 9,262 1,275 Updated Sep 14, 2024

tencent-ailab / V-Express

V-Express aims to generate a talking head video under the control of a reference image, an audio, and a sequence of V-Kps images.

Python 2,202 277 Updated Jun 29, 2024

langgenius / dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting yo…

TypeScript 47,176 6,703 Updated Oct 3, 2024