forked from suno-ai/bark
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Update Doc: Added extra voice examples, updated documentation and rel…
…ated code example (suno-ai#290)
- Loading branch information
Showing
1 changed file
with
32 additions
and
17 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,7 +4,7 @@ | |
[![Twitter](https://img.shields.io/twitter/url/https/twitter.com/OnusFM.svg?style=social&label=@OnusFM)](https://twitter.com/OnusFM) | ||
<a href="http://www.repostatus.org/#active"><img src="http://www.repostatus.org/badges/latest/active.svg" /></a> | ||
|
||
[Examples](https://suno-ai.notion.site/Bark-Examples-5edae8b02a604b54a42244ba45ebc2e2) • [Suno Studio Waitlist](https://3os84zs17th.typeform.com/suno-studio) • [Updates](#-updates) • [How to Use](#-usage-in-python) • [Installation](#-installation) • [FAQ](#-faq) | ||
> 🔗 [Examples](https://suno-ai.notion.site/Bark-Examples-5edae8b02a604b54a42244ba45ebc2e2) • [Suno Studio Waitlist](https://3os84zs17th.typeform.com/suno-studio) • [Updates](#-updates) • [How to Use](#-usage-in-python) • [Installation](#-installation) • [FAQ](#-faq) | ||
[//]: <br> (vertical spaces around image) | ||
<br> | ||
|
@@ -18,6 +18,13 @@ Bark is a transformer-based text-to-audio model created by [Suno](https://suno.a | |
## ⚠ Disclaimer | ||
Bark was developed for research purposes. It is not a conventional text-to-speech model but instead a fully generative text-to-audio model, which can deviate in unexpected ways from provided prompts. Suno does not take responsibility for any output generated. Use at your own risk, and please act responsibly. | ||
|
||
## 📖 Quick Index | ||
* [🚀 Updates](#-updates) | ||
* [💻 Installation](#-installation) | ||
* [🐍 Usage](#-usage-in-python) | ||
* [🌀 Live Examples](https://suno-ai.notion.site/Bark-Examples-5edae8b02a604b54a42244ba45ebc2e2) | ||
* [❓ FAQ](#-faq) | ||
|
||
## 🎧 Demos | ||
|
||
[![Open in Spaces](https://img.shields.io/badge/🤗-Open%20in%20Spaces-blue.svg)](https://huggingface.co/spaces/suno/bark) | ||
|
@@ -66,8 +73,8 @@ write_wav("bark_generation.wav", SAMPLE_RATE, audio_array) | |
# play text in notebook | ||
Audio(audio_array, rate=SAMPLE_RATE) | ||
``` | ||
|
||
[pizza.webm](https://user-images.githubusercontent.com/5068315/230490503-417e688d-5115-4eee-9550-b46a2b465ee3.webm) | ||
[pizza1.webm](https://user-images.githubusercontent.com/34592747/cfa98e54-721c-4b9c-b962-688e09db684f.webm) | ||
|
||
</details> | ||
|
||
|
@@ -87,7 +94,18 @@ audio_array = generate_audio(text_prompt) | |
``` | ||
[suno_korean.webm](https://user-images.githubusercontent.com/32879321/235313033-dc4477b9-2da0-4b94-9c8b-a8c2d8f5bb5e.webm) | ||
|
||
*Note: since Bark recognizes languages automatically from input text, it is possible to use for example a german history prompt with english text. This usually leads to english audio with a german accent.* | ||
*Note: since Bark recognizes languages automatically from input text, it is possible to use, for example, a german history prompt with english text. This usually leads to english audio with a german accent.* | ||
```python | ||
text_prompt = """ | ||
Der Dreißigjährige Krieg (1618-1648) war ein verheerender Konflikt, der Europa stark geprägt hat. | ||
This is a beginning of the history. If you want to hear more, please continue. | ||
""" | ||
audio_array = generate_audio(text_prompt) | ||
``` | ||
[suno_german_accent.webm](https://user-images.githubusercontent.com/34592747/3f96ab3e-02ec-49cb-97a6-cf5af0b3524a.webm) | ||
|
||
|
||
|
||
</details> | ||
|
||
|
@@ -109,11 +127,9 @@ audio_array = generate_audio(text_prompt) | |
<details open> | ||
<summary><h3>🎤 Voice Presets</h3></summary> | ||
|
||
Bark supports 100+ speaker presets across [supported languages](#supported-languages). You can browse the library of speaker presets [here](https://suno-ai.notion.site/8b8e8749ed514b0cbf3f699013548683?v=bc67cff786b04b50b3ceb756fd05f68c), or in the [code](bark/assets/prompts). The community also often shares presets in [Discord](https://discord.gg/J2B2vsjKuE). | ||
Bark supports 100+ speaker presets across [supported languages](#supported-languages). You can browse the library of supported voice presets [HERE](https://suno-ai.notion.site/8b8e8749ed514b0cbf3f699013548683?v=bc67cff786b04b50b3ceb756fd05f68c), or in the [code](bark/assets/prompts). The community also often shares presets in [Discord](https://discord.gg/J2B2vsjKuE). | ||
|
||
Bark tries to match the tone, pitch, emotion and prosody of a given preset, but does not currently support custom voice cloning. The model also attempts to preserve music, ambient noise, etc. | ||
<br> | ||
<br> | ||
> Bark tries to match the tone, pitch, emotion and prosody of a given preset, but does not currently support custom voice cloning. The model also attempts to preserve music, ambient noise, etc. | ||
```python | ||
text_prompt = """ | ||
|
@@ -126,9 +142,9 @@ audio_array = generate_audio(text_prompt, history_prompt="v2/en_speaker_1") | |
[sloth.webm](https://user-images.githubusercontent.com/5068315/230684883-a344c619-a560-4ff5-8b99-b4463a34487b.webm) | ||
</details> | ||
|
||
### Generating Longer Audio | ||
### 📃 Generating Longer Audio | ||
|
||
By default, `generate_audio` works well with around 13 seconds of spoken text. For an example of how to do long-form generation, see this [example notebook](notebooks/long_form_generation.ipynb). | ||
By default, `generate_audio` works well with around 13 seconds of spoken text. For an example of how to do long-form generation, see 👉 **[Notebook](notebooks/long_form_generation.ipynb)** 👈 | ||
|
||
<details> | ||
<summary>Click to toggle example long-form generations (from the example notebook)</summary> | ||
|
@@ -145,18 +161,17 @@ By default, `generate_audio` works well with around 13 seconds of spoken text. F | |
|
||
|
||
## 💻 Installation | ||
|
||
``` | ||
*‼️ CAUTION ‼️ Do NOT use `pip install bark`. It installs a different package, which is not managed by Suno.* | ||
```bash | ||
pip install git+https://github.com/suno-ai/bark.git | ||
``` | ||
|
||
or | ||
|
||
``` | ||
```bash | ||
git clone https://github.com/suno-ai/bark | ||
cd bark && pip install . | ||
``` | ||
*Note: Do NOT use 'pip install bark'. It installs a different package, which is not managed by Suno.* | ||
|
||
|
||
## 🛠️ Hardware and Inference Speed | ||
|
@@ -172,7 +187,7 @@ If you don't have hardware available or if you want to play with bigger versions | |
|
||
## ⚙️ Details | ||
|
||
Bark is fully generative tex-to-audio model devolved for research and demo purposes. It follows a GPT style architecture similar to [AudioLM](https://arxiv.org/abs/2209.03143) and [Vall-E](https://arxiv.org/abs/2301.02111) and a quantized Audio representation from [EnCodec](https://github.com/facebookresearch/encodec). It is not a conventional TTS model, but instead a fully generative text-to-audio model capable of deviating in unexpected ways from any given script. Different to previous approaches, the input text prompt is converted directly to audio without the intermediate use of phonemes. It can therefore generalize to arbitrary instructions beyond speech such as music lyrics, sound effects or other non-speech sounds. | ||
Bark is fully generative text-to-audio model devolved for research and demo purposes. It follows a GPT style architecture similar to [AudioLM](https://arxiv.org/abs/2209.03143) and [Vall-E](https://arxiv.org/abs/2301.02111) and a quantized Audio representation from [EnCodec](https://github.com/facebookresearch/encodec). It is not a conventional TTS model, but instead a fully generative text-to-audio model capable of deviating in unexpected ways from any given script. Different to previous approaches, the input text prompt is converted directly to audio without the intermediate use of phonemes. It can therefore generalize to arbitrary instructions beyond speech such as music lyrics, sound effects or other non-speech sounds. | ||
|
||
Below is a list of some known non-speech sounds, but we are finding more every day. Please let us know if you find patterns that work particularly well on [Discord](https://discord.gg/J2B2vsjKuE)! | ||
|
||
|
@@ -190,7 +205,7 @@ Below is a list of some known non-speech sounds, but we are finding more every d | |
### Supported Languages | ||
|
||
| Language | Status | | ||
| --- | --- | | ||
| --- | :---: | | ||
| English (en) | ✅ | | ||
| German (de) | ✅ | | ||
| Spanish (es) | ✅ | | ||
|
@@ -218,7 +233,7 @@ Requests for future language support [here](https://github.com/suno-ai/bark/disc | |
|
||
Bark is licensed under the MIT License. | ||
|
||
Please contact us at `[email protected]` to request access to a larger version of the model. | ||
Please contact us at 📧 [[email protected]](mailto:[email protected]) to request access to a larger version of the model. | ||
|
||
## 📱 Community | ||
|
||
|