Update Doc: Added extra voice examples, updated documentation and rel…

…ated code example (suno-ai#290)
zs1973 · May 16, 2023 · 2d9eded · 2d9eded
1 parent 81d3a50
commit 2d9eded
Showing 1 changed file with 32 additions and 17 deletions.
diff --git a/README.md b/README.md
@@ -4,7 +4,7 @@
 [![Twitter](https://img.shields.io/twitter/url/https/twitter.com/OnusFM.svg?style=social&label=@OnusFM)](https://twitter.com/OnusFM)
 <a href="http://www.repostatus.org/#active"><img src="http://www.repostatus.org/badges/latest/active.svg" /></a>
 
-[Examples](https://suno-ai.notion.site/Bark-Examples-5edae8b02a604b54a42244ba45ebc2e2) • [Suno Studio Waitlist](https://3os84zs17th.typeform.com/suno-studio) • [Updates](#-updates) • [How to Use](#-usage-in-python) • [Installation](#-installation) • [FAQ](#-faq)
+> 🔗 [Examples](https://suno-ai.notion.site/Bark-Examples-5edae8b02a604b54a42244ba45ebc2e2) • [Suno Studio Waitlist](https://3os84zs17th.typeform.com/suno-studio) • [Updates](#-updates) • [How to Use](#-usage-in-python) • [Installation](#-installation) • [FAQ](#-faq)
 
 [//]: <br> (vertical spaces around image)
 <br>
@@ -18,6 +18,13 @@ Bark is a transformer-based text-to-audio model created by [Suno](https://suno.a
 ## ⚠ Disclaimer
 Bark was developed for research purposes. It is not a conventional text-to-speech model but instead a fully generative text-to-audio model, which can deviate in unexpected ways from provided prompts. Suno does not take responsibility for any output generated. Use at your own risk, and please act responsibly.
 
+## 📖 Quick Index
+* [🚀 Updates](#-updates)
+* [💻 Installation](#-installation)
+* [🐍 Usage](#-usage-in-python)
+* [🌀 Live Examples](https://suno-ai.notion.site/Bark-Examples-5edae8b02a604b54a42244ba45ebc2e2)
+* [❓ FAQ](#-faq)
+
 ## 🎧 Demos  
 
 [![Open in Spaces](https://img.shields.io/badge/🤗-Open%20in%20Spaces-blue.svg)](https://huggingface.co/spaces/suno/bark)
@@ -66,8 +73,8 @@ write_wav("bark_generation.wav", SAMPLE_RATE, audio_array)
 # play text in notebook
 Audio(audio_array, rate=SAMPLE_RATE)
 ```
-
-[pizza.webm](https://user-images.githubusercontent.com/5068315/230490503-417e688d-5115-4eee-9550-b46a2b465ee3.webm)
+     
+[pizza1.webm](https://user-images.githubusercontent.com/34592747/cfa98e54-721c-4b9c-b962-688e09db684f.webm)
 
 </details>
 
@@ -87,7 +94,18 @@ audio_array = generate_audio(text_prompt)
 ```
 [suno_korean.webm](https://user-images.githubusercontent.com/32879321/235313033-dc4477b9-2da0-4b94-9c8b-a8c2d8f5bb5e.webm)
 
-*Note: since Bark recognizes languages automatically from input text, it is possible to use for example a german history prompt with english text. This usually leads to english audio with a german accent.*
+*Note: since Bark recognizes languages automatically from input text, it is possible to use, for example, a german history prompt with english text. This usually leads to english audio with a german accent.*
+```python
+text_prompt = """
+    Der Dreißigjährige Krieg (1618-1648) war ein verheerender Konflikt, der Europa stark geprägt hat.
+    This is a beginning of the history. If you want to hear more, please continue.
+"""
+audio_array = generate_audio(text_prompt)
+```
+[suno_german_accent.webm](https://user-images.githubusercontent.com/34592747/3f96ab3e-02ec-49cb-97a6-cf5af0b3524a.webm)
+
+
+     
 
 </details>
 
@@ -109,11 +127,9 @@ audio_array = generate_audio(text_prompt)
 <details open>
 <summary><h3>🎤 Voice Presets</h3></summary>
 
-Bark supports 100+ speaker presets across [supported languages](#supported-languages). You can browse the library of speaker presets [here](https://suno-ai.notion.site/8b8e8749ed514b0cbf3f699013548683?v=bc67cff786b04b50b3ceb756fd05f68c), or in the [code](bark/assets/prompts). The community also often shares presets in [Discord](https://discord.gg/J2B2vsjKuE).
+Bark supports 100+ speaker presets across [supported languages](#supported-languages). You can browse the library of supported voice presets [HERE](https://suno-ai.notion.site/8b8e8749ed514b0cbf3f699013548683?v=bc67cff786b04b50b3ceb756fd05f68c), or in the [code](bark/assets/prompts). The community also often shares presets in [Discord](https://discord.gg/J2B2vsjKuE).
 
-Bark tries to match the tone, pitch, emotion and prosody of a given preset, but does not currently support custom voice cloning. The model also attempts to preserve music, ambient noise, etc.
-<br>
-<br>
+> Bark tries to match the tone, pitch, emotion and prosody of a given preset, but does not currently support custom voice cloning. The model also attempts to preserve music, ambient noise, etc.
 
 ```python
 text_prompt = """
@@ -126,9 +142,9 @@ audio_array = generate_audio(text_prompt, history_prompt="v2/en_speaker_1")
 [sloth.webm](https://user-images.githubusercontent.com/5068315/230684883-a344c619-a560-4ff5-8b99-b4463a34487b.webm)
 </details>
 
-### Generating Longer Audio
+### 📃 Generating Longer Audio
 
-By default, `generate_audio` works well with around 13 seconds of spoken text. For an example of how to do long-form generation, see this [example notebook](notebooks/long_form_generation.ipynb).
+By default, `generate_audio` works well with around 13 seconds of spoken text. For an example of how to do long-form generation, see 👉 **[Notebook](notebooks/long_form_generation.ipynb)** 👈
 
 <details>
 <summary>Click to toggle example long-form generations (from the example notebook)</summary>
@@ -145,18 +161,17 @@ By default, `generate_audio` works well with around 13 seconds of spoken text. F
 
 
 ## 💻 Installation
-
-```
+*‼️ CAUTION ‼️ Do NOT use `pip install bark`. It installs a different package, which is not managed by Suno.*
+```bash
 pip install git+https://github.com/suno-ai/bark.git
 ```
 
 or
 
-```
+```bash
 git clone https://github.com/suno-ai/bark
 cd bark && pip install . 
 ```
-*Note: Do NOT use 'pip install bark'. It installs a different package, which is not managed by Suno.*
 
 
 ## 🛠️ Hardware and Inference Speed
@@ -172,7 +187,7 @@ If you don't have hardware available or if you want to play with bigger versions
 
 ## ⚙️ Details
 
-Bark is fully generative tex-to-audio model devolved for research and demo purposes. It follows a GPT style architecture similar to [AudioLM](https://arxiv.org/abs/2209.03143) and [Vall-E](https://arxiv.org/abs/2301.02111) and a quantized Audio representation from [EnCodec](https://github.com/facebookresearch/encodec). It is not a conventional TTS model, but instead a fully generative text-to-audio model capable of deviating in unexpected ways from any given script. Different to previous approaches, the input text prompt is converted directly to audio without the intermediate use of phonemes. It can therefore generalize to arbitrary instructions beyond speech such as music lyrics, sound effects or other non-speech sounds.
+Bark is fully generative text-to-audio model devolved for research and demo purposes. It follows a GPT style architecture similar to [AudioLM](https://arxiv.org/abs/2209.03143) and [Vall-E](https://arxiv.org/abs/2301.02111) and a quantized Audio representation from [EnCodec](https://github.com/facebookresearch/encodec). It is not a conventional TTS model, but instead a fully generative text-to-audio model capable of deviating in unexpected ways from any given script. Different to previous approaches, the input text prompt is converted directly to audio without the intermediate use of phonemes. It can therefore generalize to arbitrary instructions beyond speech such as music lyrics, sound effects or other non-speech sounds.
 
 Below is a list of some known non-speech sounds, but we are finding more every day. Please let us know if you find patterns that work particularly well on [Discord](https://discord.gg/J2B2vsjKuE)!
 
@@ -190,7 +205,7 @@ Below is a list of some known non-speech sounds, but we are finding more every d
 ### Supported Languages
 
 | Language | Status |
-| --- | --- |
+| --- | :---: |
 | English (en) | ✅ |
 | German (de) | ✅ |
 | Spanish (es) | ✅ |
@@ -218,7 +233,7 @@ Requests for future language support [here](https://github.com/suno-ai/bark/disc
 
 Bark is licensed under the MIT License. 
 
-Please contact us at `[email protected]` to request access to a larger version of the model.  
+Please contact us at 📧 [[email protected]](mailto:[email protected]) to request access to a larger version of the model.  
 
 ## 📱 Community