Starred repositories
High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
🥚 Transform PDF to JSON or Markdown with ease and speed 🐣
Noise supression using deep filtering
Python text-to-speech library with built-in voice effects and support for multiple TTS engines
Ikaros-521 / AI-Vtuber
Forked from sandboxdream/AI-VtuberAI Vtuber是一个由 【ChatterBot/ChatGPT/claude/langchain/chatglm/text-gen-webui/闻达/千问/kimi/ollama】 驱动的虚拟主播【Live2D/UE/xuniren】,可以在 【Bilibili/抖音/快手/微信视频号/拼多多/斗鱼/YouTube/twitch/TikTok】 直播中与观众实时互动 或 直接在本地进行聊…
Real time interactive streaming digital human
Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
实时STT,连接OpenAI接口/智谱AI(流式LLM)和GPT-SOVITS/Edge-TTS,通过网页的方式,进行跨网络的服务调用,实现实时对话的效果
Table Recognition and Content Extraction in PDF Files
OpenCV-Python图像处理教程
Run your own AI cluster at home with everyday devices 📱💻 🖥️⌚
A Comprehensive Toolkit for High-Quality PDF Content Extraction
A one-stop, open-source, high-quality data extraction tool, supports PDF/webpage/e-book extraction.一站式开源高质量数据提取工具,支持PDF/网页/多格式电子书提取。
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Digital Avatar Conversational System - Linly-Talker. 😄✨ Linly-Talker is an intelligent AI system that combines large language models (LLMs) with visual models to create a novel human-AI interaction…
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
Instant voice cloning by MIT and MyShell.
High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.
Platform to experiment with the AI Software Engineer. Terminal based. NOTE: Very different from https://gptengineer.app
多平台容器镜像代理服务,支持 Docker Hub, GitHub, Google, k8s, Quay, Microsoft 等镜像仓库.