The whisperX API is a tool for enhancing and analyzing audio content. This API provides a suite of services for processing audio and video files, including transcription, alignment, diarization, and combining transcript with diarization results.
Swagger UI is available at /docs
for all the services, dump of OpenAPI definition is awailable in folder app/docs
as well. You can explore it directly in Swagger Editor
See the WhisperX Documentation for details on whisperX functions.
- in
.env
you can define default LanguageDEFAULT_LANG
, if not defined en is used (you can also set it in the request) .env
contains defintion of Whisper model usingWHISPER_MODEL
(you can also set it in the request)
Status and result of each tasks are stored in db using ORM Sqlalchemy, db connection is defined by enviroment variable DB_URL
if value is not specified db.py
sets default db as sqlite:///records.db
See documentation for driver definition at Sqlalchemy Engine configuration if you want to connect other type of db than Sqlite.
Structure of the of the db is described in DB Schema
To get started with the API, follow these steps:
- Create virtual enviroment
- Install pytorch See for more details
- Install whisperX
pip install git+https://github.com/m-bain/whisperx.git
- Install the required dependencies:
pip install -r requirements.txt
- Create
.env
file
define your Whisper Model and token for Huggingface
HF_TOKEN=<<YOUR HUGGINGFACE TOKEN>>
WHISPER_MODEL=<<WHISPER MODEL SIZE>>
- Run the FastAPI application:
uvicorn app.main:app --reload
The API will be accessible at http://127.0.0.1:8000.
- Create
.env
file
HF_TOKEN=<<YOUR HUGGINGFACE TOKEN>>
WHISPER_MODEL=<<WHISPER MODEL SIZE>>
- Build Image
using docker-compose.yaml
#build and start the image using compose file
docker-compose up
alternative approach
#build image
docker build -t whisperx-service .
# Run Container
docker run -d --gpus all -p 8000:8000 --env-file .env whisperx-service
The API will be accessible at http://127.0.0.1:8000.
The models used by whisperX are stored in root/.cache
, if you want to avoid downloanding the models each time the container is starting you can store the cache in persistant storage. docker-compose.yaml
defines a volumne whisperx-models-cache
to store this cache.
- faster-whisper cache:
root/.cache/huggingface/hub
- pyannotate and other models cache:
root/.cache/torch