Skip to content

Commit

Permalink
added fp16 support for whisper inference on cuda
Browse files Browse the repository at this point in the history
  • Loading branch information
NavodPeiris committed Jun 4, 2024
1 parent 43fdbb6 commit d94bc22
Show file tree
Hide file tree
Showing 7 changed files with 37 additions and 23 deletions.
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@

</p>


### Run your IDE as administrator

you will get following error if administrator permission is not there:
Expand Down Expand Up @@ -88,7 +89,7 @@ transcript will also indicate the timeframe in seconds where each speaker speaks
```
from speechlib import Transcriptor
file = "obama1.wav" # your audio file
file = "obama_zach.wav" # your audio file
voices_folder = "voices" # voices folder containing voice samples for recognition
language = "en" # language code
log_folder = "logs" # log folder for storing transcripts
Expand Down
2 changes: 1 addition & 1 deletion examples/transcribe.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
from speechlib import Transcriptor

file = "obama1.wav" # your audio file
file = "obama_zach.wav" # your audio file
voices_folder = "voices" # voices folder containing voice samples for recognition
language = "en" # language code
log_folder = "logs" # log folder for storing transcripts
Expand Down
23 changes: 15 additions & 8 deletions library.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ transcript will also indicate the timeframe in seconds where each speaker speaks
```
from speechlib import Transcriptor
file = "obama1.wav" # your audio file
file = "obama_zach.wav" # your audio file
voices_folder = "voices" # voices folder containing voice samples for recognition
language = "en" # language code
log_folder = "logs" # log folder for storing transcripts
Expand All @@ -99,13 +99,20 @@ end: ending time of speech in seconds
text: transcribed text for speech during start and end
speaker: speaker of the text

#### voices_folder structure:

![voices_folder_structure](voices_folder_structure1.png)

#### Transcription:

![transcription](transcript.png)
#### voices folder structure:
```
voices_folder
|---> person1
| |---> sample1.wav
| |---> sample2.wav
| ...
|
|---> person2
| |---> sample1.wav
| |---> sample2.wav
| ...
|--> ...
```

supported language codes:

Expand Down
17 changes: 9 additions & 8 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
transformers
torch
torchaudio
pydub
pyannote.audio
speechbrain
accelerate
faster-whisper
transformers==4.36.2
torch==2.1.2
torchaudio==2.1.2
pydub==0.25.1
pyannote.audio==3.1.1
speechbrain==0.5.16
accelerate==0.26.1
faster-whisper==0.10.1
openai-whisper==20231117
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

setup(
name="speechlib",
version="1.1.1",
version="1.1.2",
description="speechlib is a library that can do speaker diarization, transcription and speaker recognition on an audio file to create transcripts with actual speaker names. This library also contain audio preprocessor functions.",
packages=find_packages(),
long_description=long_description,
Expand Down
2 changes: 1 addition & 1 deletion setup_instruction.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ for publishing:
pip install twine

for install locally for testing:
pip install dist/speechlib-1.1.0-py3-none-any.whl
pip install dist/speechlib-1.1.2-py3-none-any.whl

finally run:
twine upload dist/*
Expand Down
11 changes: 8 additions & 3 deletions speechlib/transcribe.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,9 +32,14 @@ def transcribe(file, language, model_size, whisper_type, quantization):
Exception("Language code not supported.\nThese are the supported languages:\n", model.supported_languages)
else:
try:
model = whisper.load_model(model_size)
result = model.transcribe(file, language=language)
res = result["text"]
if torch.cuda.is_available():
model = whisper.load_model(model_size, device="cuda")
result = model.transcribe(file, language=language, fp16=True)
res = result["text"]
else:
model = whisper.load_model(model_size, device="cpu")
result = model.transcribe(file, language=language, fp16=False)
res = result["text"]

return res
except Exception as err:
Expand Down

0 comments on commit d94bc22

Please sign in to comment.