Skip to content
View xzm2004260's full-sized avatar
  • Xiamen

Block or report xzm2004260

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Text-to-Music Generation with Rectified Flow Transformers

Python 1,448 110 Updated Sep 6, 2024

粵語正字法

13 6 Updated Jul 22, 2020

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Python 2,550 245 Updated Sep 14, 2024
Python 3 Updated Aug 8, 2024

A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, qwen-vl, phi3-v etc.

Python 130 9 Updated Sep 13, 2024

SpeechGPT Series: Speech Large Language Models

Python 1,225 80 Updated Jul 22, 2024

Models and code for RepCodec: A Speech Representation Codec for Speech Tokenization

Python 146 10 Updated Jul 12, 2024
Python 13 3 Updated Sep 19, 2024

Implementation of Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt (NAACL'24).

Python 61 9 Updated Jul 21, 2024

Localized watermarking for AI-generated speech audios, with SOTA on robustness and very fast detector

Python 414 47 Updated Aug 28, 2024

ChatTTS 2000条音色稳定性打分🥇+区分男女年龄👧+在线试听🔈 ChatTTS 2K Speaker Stability Score & Categorized by Gender and Age & Audio Preview

Python 462 23 Updated Jul 2, 2024
Python 28 2 Updated Sep 19, 2024

The roadmap of generative AI: use cases and applications | 生成式AI的应用路线图

593 72 Updated Oct 1, 2023

High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.

Python 4,421 554 Updated Aug 9, 2024

Bark Voice Cloning and Voice Cloning for Chinese Speech

Jupyter Notebook 2,726 392 Updated Aug 8, 2024

Suno AI's Bark model in C/C++ for fast text-to-speech

C++ 688 53 Updated Jul 17, 2024

Barkify: an unoffical training implementation of Bark TTS by suno-ai

Python 122 21 Updated May 31, 2023

It is a multi-lingual (97 languages) text content automatic recognition and segmentation tool. 强大的TTS多语言(97种语言)混合文本内容自动分词工具。

Python 90 8 Updated Sep 7, 2024

Automatic speech annotator processing speech with voice activaty detection, overlapping speech detection, speaker diarization and automatic speech recognition

Python 29 4 Updated Jun 14, 2024

Inference and training library for high-quality TTS models.

Python 4,227 417 Updated Aug 19, 2024
JavaScript 7 3 Updated Aug 5, 2024

Awesome speech/audio LLMs, representation learning, and codec models

587 26 Updated Sep 20, 2024

Experimental implementation for a sparse-dictionary based version of the VQ-VAE2 paper

Python 30 12 Updated Oct 27, 2023

Zero-Shot Speech Editing and Text-to-Speech in the Wild

Jupyter Notebook 7,478 735 Updated Jun 24, 2024

Foundational model for human-like, expressive TTS

Python 3,730 648 Updated Jul 30, 2024

Audiogen Codec

Python 116 11 Updated Jul 9, 2024

Unified Speech Language Model for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"(ICLR 2024)

Python 127 11 Updated Sep 14, 2023

Instant voice cloning by MIT and MyShell.

Python 28,465 2,784 Updated Aug 21, 2024

VoicePAT is a modular and efficient toolkit for voice privacy research, with main focus on speaker anonymization.

Shell 45 4 Updated May 14, 2024

[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

Python 582 42 Updated Sep 9, 2024
Next