xzm2004260

xzm2004 xzm2004260

speech synthesis , TTS

58 followers · 714 following

Xiamen

Stars

feizc / FluxMusic

Text-to-Music Generation with Rectified Flow Transformers

Python 1,448 110 Updated Sep 6, 2024

CanCLID / cantonese_orthography

粵語正字法

13 6 Updated Jul 22, 2020

gpt-omni / mini-omni

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Python 2,550 245 Updated Sep 14, 2024

tangyixuan / 2ndPassContextASR

Python 3 Updated Aug 8, 2024

zjysteven / lmms-finetune

A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, qwen-vl, phi3-v etc.

Python 130 9 Updated Sep 13, 2024

0nutation / SpeechGPT

SpeechGPT Series: Speech Large Language Models

Python 1,225 80 Updated Jul 22, 2024

mct10 / RepCodec

Models and code for RepCodec: A Speech Representation Codec for Speech Tokenization

Python 146 10 Updated Jul 12, 2024

just-ai / speechflow

Python 13 3 Updated Sep 19, 2024

cyanbx / Prompt-Singer

Implementation of Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt (NAACL'24).

Python 61 9 Updated Jul 21, 2024

facebookresearch / audioseal

Localized watermarking for AI-generated speech audios, with SOTA on robustness and very fast detector

Python 414 47 Updated Aug 28, 2024

6drf21e / ChatTTS_Speaker

ChatTTS 2000条音色稳定性打分🥇+区分男女年龄👧+在线试听🔈 ChatTTS 2K Speaker Stability Score & Categorized by Gender and Age & Audio Preview

Python 462 23 Updated Jul 2, 2024

Edresson / ZS-TTS-Evaluation

Python 28 2 Updated Sep 19, 2024

SeedV / generative-ai-roadmap

The roadmap of generative AI: use cases and applications | 生成式AI的应用路线图

593 72 Updated Oct 1, 2023

myshell-ai / MeloTTS

High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.

Python 4,421 554 Updated Aug 9, 2024

KevinWang676 / Bark-Voice-Cloning

Bark Voice Cloning and Voice Cloning for Chinese Speech

Jupyter Notebook 2,726 392 Updated Aug 8, 2024

PABannier / bark.cpp

Suno AI's Bark model in C/C++ for fast text-to-speech

C++ 688 53 Updated Jul 17, 2024

anyvoiceai / Barkify

Barkify: an unoffical training implementation of Bark TTS by suno-ai

Python 122 21 Updated May 31, 2023

juntaosun / LangSegment

It is a multi-lingual (97 languages) text content automatic recognition and segmentation tool. 强大的TTS多语言（97种语言）混合文本内容自动分词工具。

Python 90 8 Updated Sep 7, 2024

WangHelin1997 / Automatic_Speech_Annotator

Automatic speech annotator processing speech with voice activaty detection, overlapping speech detection, speaker diarization and automatic speech recognition

Python 29 4 Updated Jun 14, 2024

huggingface / parler-tts

Inference and training library for high-quality TTS models.

Python 4,227 417 Updated Aug 19, 2024

cocktailpeanutlabs / parler-tts

JavaScript 7 3 Updated Aug 5, 2024

ga642381 / speech-trident

Awesome speech/audio LLMs, representation learning, and codec models

587 26 Updated Sep 20, 2024

amzn / sparse-vqvae

Experimental implementation for a sparse-dictionary based version of the VQ-VAE2 paper

Python 30 12 Updated Oct 27, 2023

jasonppy / VoiceCraft

Zero-Shot Speech Editing and Text-to-Speech in the Wild

Jupyter Notebook 7,478 735 Updated Jun 24, 2024

metavoiceio / metavoice-src

Foundational model for human-like, expressive TTS

Python 3,730 648 Updated Jul 30, 2024

AudiogenAI / agc

Audiogen Codec

Python 116 11 Updated Jul 9, 2024

0nutation / USLM

Unified Speech Language Model for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"(ICLR 2024)

Python 127 11 Updated Sep 14, 2023

myshell-ai / OpenVoice

Instant voice cloning by MIT and MyShell.

Python 28,465 2,784 Updated Aug 21, 2024

DigitalPhonetics / VoicePAT

VoicePAT is a modular and efficient toolkit for voice privacy research, with main focus on speaker anonymization.

Shell 45 4 Updated May 14, 2024

ddlBoJack / emotion2vec

[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

Python 582 42 Updated Sep 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly