xzm2004260

xzm2004 xzm2004260

speech synthesis , TTS

58 followers · 714 following

Xiamen

mini-omni Public
Forked from gpt-omni/mini-omni

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Python MIT License Updated Sep 9, 2024
FluxMusic Public
Forked from feizc/FluxMusic

Text-to-Music Generation with Rectified Flow Transformers

Python Other Updated Sep 6, 2024
SimpleSpeech Public
Forked from yangdongchao/SimpleSpeech

The open source code for SimpleSpeech series

Python 1 Updated Aug 19, 2024
SpeechGPT Public
Forked from 0nutation/SpeechGPT

SpeechGPT Series: Speech Large Language Models

Python Apache License 2.0 Updated Jul 22, 2024
speechflow Public
Forked from just-ai/speechflow

Python Apache License 2.0 Updated Jun 19, 2024
Prompt-Singer Public
Forked from cyanbx/Prompt-Singer

Implementation of Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt (NAACL'24).

Python MIT License Updated Jun 19, 2024
log-wmse-audio-quality Public
Forked from nomonosound/log-wmse-audio-quality

logWMSE, an audio quality metric with support for digital silence target. Useful for evaluating audio source separation systems, even when there are many audio tracks or stems.

Python Apache License 2.0 Updated Jun 18, 2024
seed-tts-eval Public
Forked from BytedanceSpeech/seed-tts-eval

Python Updated Jun 6, 2024
ZS-TTS-Evaluation Public
Forked from Edresson/ZS-TTS-Evaluation

Python MIT License Updated Jun 5, 2024
LookOnceToHear Public
Forked from vb000/LookOnceToHear

A novel human-interaction method for real-time speech extraction on headphones.

Python Other Updated May 27, 2024
Bark-Voice-Cloning Public
Forked from KevinWang676/Bark-Voice-Cloning

Bark Voice Cloning and Voice Cloning for Chinese Speech

Jupyter Notebook MIT License Updated May 11, 2024
Automatic_Speech_Annotator Public
Forked from WangHelin1997/Automatic_Speech_Annotator

Automatic speech annotator processing speech with voice activaty detection, overlapping speech detection, speaker diarization and automatic speech recognition

Python Updated May 11, 2024
Codec-SUPERB Public
Forked from voidful/Codec-SUPERB

Audio Codec Speech processing Universal PERformance Benchmark

Python 1 Updated May 6, 2024
SOFA Public
Forked from qiuqiao/SOFA

SOFA: Singing-Oriented Forced Aligner

Python MIT License Updated Apr 21, 2024
LangSegment Public
Forked from juntaosun/LangSegment

It is a multi-lingual (97 languages) text content automatic recognition and segmentation tool. 强大的TTS多语言（97种语言）混合文本内容自动分词工具。

Python Updated Apr 18, 2024
parler-tts Public
Forked from huggingface/parler-tts

Inference and training library for high-quality TTS models.

Python Apache License 2.0 Updated Apr 10, 2024
StableTTS Public
Forked from KdaiP/StableTTS

Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3

Python MIT License Updated Apr 2, 2024
DiJiang Public
Forked from YuchuanTian/DiJiang

The official implementation of "DiJiang: Efficient Large Language Models through Compact Kernelization"

Python Updated Apr 1, 2024
VoiceCraft Public
Forked from jasonppy/VoiceCraft

Zero-Shot Speech Editing and Text-to-Speech in the Wild

Python Other Updated Mar 21, 2024
audioseal Public
Forked from facebookresearch/audioseal

Localized watermarking for AI-generated speech audios, with SOTA on robustness and very fast detector

Python MIT License Updated Mar 20, 2024
audiowmark Public
Forked from swesterfeld/audiowmark

Audio Watermarking

C++ GNU General Public License v3.0 Updated Mar 6, 2024
metavoice-src Public
Forked from metavoiceio/metavoice-src

Foundational model for human-like, expressive TTS

Python Apache License 2.0 Updated Mar 6, 2024
python-jyutping Public
Forked from imdreamrunner/python-jyutping

Python 汉字到粤拼转换工具。

Python Updated Feb 26, 2024
supervoice-gpt Public
Forked from ex3ndr/supervoice-gpt

GPT-style network for phonemization with durations of text

Python Updated Feb 26, 2024
descript-audio-codec Public
Forked from descriptinc/descript-audio-codec

State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.

Python MIT License Updated Feb 21, 2024
agc Public
Forked from AudiogenAI/agc

Audiogen Codec

Python MIT License Updated Feb 20, 2024
IMS-Toucan Public
Forked from DigitalPhonetics/IMS-Toucan

Text-to-Speech Toolkit of the Speech and Language Technologies Group at the University of Stuttgart. Objectives of the development are simplicity, modularity, controllability and multilinguality.

Python Apache License 2.0 Updated Feb 18, 2024
megatts2 Public
Forked from LSimon95/megatts2

Unoffical implementation of Megatts2

Python MIT License Updated Feb 18, 2024
FunCodec Public
Forked from modelscope/FunCodec

FunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music generation et.al.

Python 1 MIT License Updated Jan 25, 2024
Amphion Public
Forked from open-mmlab/Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…

Python MIT License Updated Jan 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

xzm2004 xzm2004260

Block or report xzm2004260

mini-omni Public

FluxMusic Public

SimpleSpeech Public

SpeechGPT Public

speechflow Public

Prompt-Singer Public

log-wmse-audio-quality Public

seed-tts-eval Public

ZS-TTS-Evaluation Public

LookOnceToHear Public

Bark-Voice-Cloning Public

Automatic_Speech_Annotator Public

Codec-SUPERB Public

SOFA Public

LangSegment Public

parler-tts Public

StableTTS Public

DiJiang Public

VoiceCraft Public

audioseal Public

audiowmark Public

metavoice-src Public

python-jyutping Public

supervoice-gpt Public

descript-audio-codec Public

agc Public

IMS-Toucan Public

megatts2 Public

FunCodec Public

Amphion Public