-
mini-omni Public
Forked from gpt-omni/mini-omniopen-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
Python MIT License UpdatedSep 9, 2024 -
FluxMusic Public
Forked from feizc/FluxMusicText-to-Music Generation with Rectified Flow Transformers
Python Other UpdatedSep 6, 2024 -
SimpleSpeech Public
Forked from yangdongchao/SimpleSpeechThe open source code for SimpleSpeech series
-
SpeechGPT Public
Forked from 0nutation/SpeechGPTSpeechGPT Series: Speech Large Language Models
Python Apache License 2.0 UpdatedJul 22, 2024 -
-
Prompt-Singer Public
Forked from cyanbx/Prompt-SingerImplementation of Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt (NAACL'24).
Python MIT License UpdatedJun 19, 2024 -
log-wmse-audio-quality Public
Forked from nomonosound/log-wmse-audio-qualitylogWMSE, an audio quality metric with support for digital silence target. Useful for evaluating audio source separation systems, even when there are many audio tracks or stems.
Python Apache License 2.0 UpdatedJun 18, 2024 -
-
ZS-TTS-Evaluation Public
Forked from Edresson/ZS-TTS-EvaluationPython MIT License UpdatedJun 5, 2024 -
LookOnceToHear Public
Forked from vb000/LookOnceToHearA novel human-interaction method for real-time speech extraction on headphones.
Python Other UpdatedMay 27, 2024 -
Bark-Voice-Cloning Public
Forked from KevinWang676/Bark-Voice-CloningBark Voice Cloning and Voice Cloning for Chinese Speech
Jupyter Notebook MIT License UpdatedMay 11, 2024 -
Automatic_Speech_Annotator Public
Forked from WangHelin1997/Automatic_Speech_AnnotatorAutomatic speech annotator processing speech with voice activaty detection, overlapping speech detection, speaker diarization and automatic speech recognition
Python UpdatedMay 11, 2024 -
Codec-SUPERB Public
Forked from voidful/Codec-SUPERBAudio Codec Speech processing Universal PERformance Benchmark
-
SOFA Public
Forked from qiuqiao/SOFASOFA: Singing-Oriented Forced Aligner
Python MIT License UpdatedApr 21, 2024 -
LangSegment Public
Forked from juntaosun/LangSegmentIt is a multi-lingual (97 languages) text content automatic recognition and segmentation tool. 强大的TTS多语言(97种语言)混合文本内容自动分词工具。
Python UpdatedApr 18, 2024 -
parler-tts Public
Forked from huggingface/parler-ttsInference and training library for high-quality TTS models.
Python Apache License 2.0 UpdatedApr 10, 2024 -
StableTTS Public
Forked from KdaiP/StableTTSNext-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3
Python MIT License UpdatedApr 2, 2024 -
DiJiang Public
Forked from YuchuanTian/DiJiangThe official implementation of "DiJiang: Efficient Large Language Models through Compact Kernelization"
Python UpdatedApr 1, 2024 -
VoiceCraft Public
Forked from jasonppy/VoiceCraftZero-Shot Speech Editing and Text-to-Speech in the Wild
Python Other UpdatedMar 21, 2024 -
audioseal Public
Forked from facebookresearch/audiosealLocalized watermarking for AI-generated speech audios, with SOTA on robustness and very fast detector
Python MIT License UpdatedMar 20, 2024 -
audiowmark Public
Forked from swesterfeld/audiowmarkAudio Watermarking
C++ GNU General Public License v3.0 UpdatedMar 6, 2024 -
metavoice-src Public
Forked from metavoiceio/metavoice-srcFoundational model for human-like, expressive TTS
Python Apache License 2.0 UpdatedMar 6, 2024 -
python-jyutping Public
Forked from imdreamrunner/python-jyutpingPython 汉字到粤拼转换工具。
Python UpdatedFeb 26, 2024 -
supervoice-gpt Public
Forked from ex3ndr/supervoice-gptGPT-style network for phonemization with durations of text
Python UpdatedFeb 26, 2024 -
descript-audio-codec Public
Forked from descriptinc/descript-audio-codecState-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.
Python MIT License UpdatedFeb 21, 2024 -
-
IMS-Toucan Public
Forked from DigitalPhonetics/IMS-ToucanText-to-Speech Toolkit of the Speech and Language Technologies Group at the University of Stuttgart. Objectives of the development are simplicity, modularity, controllability and multilinguality.
Python Apache License 2.0 UpdatedFeb 18, 2024 -
megatts2 Public
Forked from LSimon95/megatts2Unoffical implementation of Megatts2
Python MIT License UpdatedFeb 18, 2024 -
FunCodec Public
Forked from modelscope/FunCodecFunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music generation et.al.
-
Amphion Public
Forked from open-mmlab/AmphionAmphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…
Python MIT License UpdatedJan 8, 2024