Skip to content

Commit

Permalink
Update Doc: Added extra voice examples, updated documentation and rel…
Browse files Browse the repository at this point in the history
…ated code example (suno-ai#290)
  • Loading branch information
Ding3LI authored May 16, 2023
1 parent 81d3a50 commit 2d9eded
Showing 1 changed file with 32 additions and 17 deletions.
49 changes: 32 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
[![Twitter](https://img.shields.io/twitter/url/https/twitter.com/OnusFM.svg?style=social&label=@OnusFM)](https://twitter.com/OnusFM)
<a href="http://www.repostatus.org/#active"><img src="http://www.repostatus.org/badges/latest/active.svg" /></a>

[Examples](https://suno-ai.notion.site/Bark-Examples-5edae8b02a604b54a42244ba45ebc2e2)[Suno Studio Waitlist](https://3os84zs17th.typeform.com/suno-studio)[Updates](#-updates)[How to Use](#-usage-in-python)[Installation](#-installation)[FAQ](#-faq)
> 🔗 [Examples](https://suno-ai.notion.site/Bark-Examples-5edae8b02a604b54a42244ba45ebc2e2)[Suno Studio Waitlist](https://3os84zs17th.typeform.com/suno-studio)[Updates](#-updates)[How to Use](#-usage-in-python)[Installation](#-installation)[FAQ](#-faq)
[//]: <br> (vertical spaces around image)
<br>
Expand All @@ -18,6 +18,13 @@ Bark is a transformer-based text-to-audio model created by [Suno](https://suno.a
## ⚠ Disclaimer
Bark was developed for research purposes. It is not a conventional text-to-speech model but instead a fully generative text-to-audio model, which can deviate in unexpected ways from provided prompts. Suno does not take responsibility for any output generated. Use at your own risk, and please act responsibly.

## 📖 Quick Index
* [🚀 Updates](#-updates)
* [💻 Installation](#-installation)
* [🐍 Usage](#-usage-in-python)
* [🌀 Live Examples](https://suno-ai.notion.site/Bark-Examples-5edae8b02a604b54a42244ba45ebc2e2)
* [❓ FAQ](#-faq)

## 🎧 Demos

[![Open in Spaces](https://img.shields.io/badge/🤗-Open%20in%20Spaces-blue.svg)](https://huggingface.co/spaces/suno/bark)
Expand Down Expand Up @@ -66,8 +73,8 @@ write_wav("bark_generation.wav", SAMPLE_RATE, audio_array)
# play text in notebook
Audio(audio_array, rate=SAMPLE_RATE)
```

[pizza.webm](https://user-images.githubusercontent.com/5068315/230490503-417e688d-5115-4eee-9550-b46a2b465ee3.webm)
[pizza1.webm](https://user-images.githubusercontent.com/34592747/cfa98e54-721c-4b9c-b962-688e09db684f.webm)

</details>

Expand All @@ -87,7 +94,18 @@ audio_array = generate_audio(text_prompt)
```
[suno_korean.webm](https://user-images.githubusercontent.com/32879321/235313033-dc4477b9-2da0-4b94-9c8b-a8c2d8f5bb5e.webm)

*Note: since Bark recognizes languages automatically from input text, it is possible to use for example a german history prompt with english text. This usually leads to english audio with a german accent.*
*Note: since Bark recognizes languages automatically from input text, it is possible to use, for example, a german history prompt with english text. This usually leads to english audio with a german accent.*
```python
text_prompt = """
Der Dreißigjährige Krieg (1618-1648) war ein verheerender Konflikt, der Europa stark geprägt hat.
This is a beginning of the history. If you want to hear more, please continue.
"""
audio_array = generate_audio(text_prompt)
```
[suno_german_accent.webm](https://user-images.githubusercontent.com/34592747/3f96ab3e-02ec-49cb-97a6-cf5af0b3524a.webm)



</details>

Expand All @@ -109,11 +127,9 @@ audio_array = generate_audio(text_prompt)
<details open>
<summary><h3>🎤 Voice Presets</h3></summary>

Bark supports 100+ speaker presets across [supported languages](#supported-languages). You can browse the library of speaker presets [here](https://suno-ai.notion.site/8b8e8749ed514b0cbf3f699013548683?v=bc67cff786b04b50b3ceb756fd05f68c), or in the [code](bark/assets/prompts). The community also often shares presets in [Discord](https://discord.gg/J2B2vsjKuE).
Bark supports 100+ speaker presets across [supported languages](#supported-languages). You can browse the library of supported voice presets [HERE](https://suno-ai.notion.site/8b8e8749ed514b0cbf3f699013548683?v=bc67cff786b04b50b3ceb756fd05f68c), or in the [code](bark/assets/prompts). The community also often shares presets in [Discord](https://discord.gg/J2B2vsjKuE).

Bark tries to match the tone, pitch, emotion and prosody of a given preset, but does not currently support custom voice cloning. The model also attempts to preserve music, ambient noise, etc.
<br>
<br>
> Bark tries to match the tone, pitch, emotion and prosody of a given preset, but does not currently support custom voice cloning. The model also attempts to preserve music, ambient noise, etc.
```python
text_prompt = """
Expand All @@ -126,9 +142,9 @@ audio_array = generate_audio(text_prompt, history_prompt="v2/en_speaker_1")
[sloth.webm](https://user-images.githubusercontent.com/5068315/230684883-a344c619-a560-4ff5-8b99-b4463a34487b.webm)
</details>

### Generating Longer Audio
### 📃 Generating Longer Audio

By default, `generate_audio` works well with around 13 seconds of spoken text. For an example of how to do long-form generation, see this [example notebook](notebooks/long_form_generation.ipynb).
By default, `generate_audio` works well with around 13 seconds of spoken text. For an example of how to do long-form generation, see 👉 **[Notebook](notebooks/long_form_generation.ipynb)** 👈

<details>
<summary>Click to toggle example long-form generations (from the example notebook)</summary>
Expand All @@ -145,18 +161,17 @@ By default, `generate_audio` works well with around 13 seconds of spoken text. F


## 💻 Installation

```
*‼️ CAUTION ‼️ Do NOT use `pip install bark`. It installs a different package, which is not managed by Suno.*
```bash
pip install git+https://github.com/suno-ai/bark.git
```

or

```
```bash
git clone https://github.com/suno-ai/bark
cd bark && pip install .
```
*Note: Do NOT use 'pip install bark'. It installs a different package, which is not managed by Suno.*


## 🛠️ Hardware and Inference Speed
Expand All @@ -172,7 +187,7 @@ If you don't have hardware available or if you want to play with bigger versions

## ⚙️ Details

Bark is fully generative tex-to-audio model devolved for research and demo purposes. It follows a GPT style architecture similar to [AudioLM](https://arxiv.org/abs/2209.03143) and [Vall-E](https://arxiv.org/abs/2301.02111) and a quantized Audio representation from [EnCodec](https://github.com/facebookresearch/encodec). It is not a conventional TTS model, but instead a fully generative text-to-audio model capable of deviating in unexpected ways from any given script. Different to previous approaches, the input text prompt is converted directly to audio without the intermediate use of phonemes. It can therefore generalize to arbitrary instructions beyond speech such as music lyrics, sound effects or other non-speech sounds.
Bark is fully generative text-to-audio model devolved for research and demo purposes. It follows a GPT style architecture similar to [AudioLM](https://arxiv.org/abs/2209.03143) and [Vall-E](https://arxiv.org/abs/2301.02111) and a quantized Audio representation from [EnCodec](https://github.com/facebookresearch/encodec). It is not a conventional TTS model, but instead a fully generative text-to-audio model capable of deviating in unexpected ways from any given script. Different to previous approaches, the input text prompt is converted directly to audio without the intermediate use of phonemes. It can therefore generalize to arbitrary instructions beyond speech such as music lyrics, sound effects or other non-speech sounds.

Below is a list of some known non-speech sounds, but we are finding more every day. Please let us know if you find patterns that work particularly well on [Discord](https://discord.gg/J2B2vsjKuE)!

Expand All @@ -190,7 +205,7 @@ Below is a list of some known non-speech sounds, but we are finding more every d
### Supported Languages

| Language | Status |
| --- | --- |
| --- | :---: |
| English (en) ||
| German (de) ||
| Spanish (es) ||
Expand Down Expand Up @@ -218,7 +233,7 @@ Requests for future language support [here](https://github.com/suno-ai/bark/disc

Bark is licensed under the MIT License.

Please contact us at `[email protected]` to request access to a larger version of the model.
Please contact us at 📧 [[email protected]](mailto:[email protected]) to request access to a larger version of the model.

## 📱 Community

Expand Down

0 comments on commit 2d9eded

Please sign in to comment.