Name		Name	Last commit message	Last commit date
parent directory ..
CMakeLists.txt		CMakeLists.txt
README.md		README.md
StreamingTDSModelConverter.cpp		StreamingTDSModelConverter.cpp
VoiceActivityDetection-CTC.cpp		VoiceActivityDetection-CTC.cpp

README.md

Tools

This directory contains tools for audio analysis and processing built on wav2letter.

To build the tools, simply pass -DW2L_BUILD_TOOLS=ON as a CMake flag when building wav2letter.

VoiceActivityDetection-CTC.cpp

Voice Activity Detection with CTC + an n-gram Language Model

VoiceActivityDetection-CTC contains a simple pipeline that supports a CTC-trained acoustic model trained with wav2letter and n-gram language model in an wav2letter binary format (see the decoder documentation for more).

Using the Pipeline

Build the tool with make VoiceActivityDetection-CTC.

Input List File

First, create an input list file containing the audio data. The list file should exactly follow the standard wav2letter list input format for training, but the transcriptions column should be empty. For instance:

// Example input file

[~/speech/data] head analyze.lst
train001 /tmp/000000000.flac 100.03
train002 /tmp/000000001.flac 360.57
train003 /tmp/000000002.flac 123.53
train004 /tmp/000000003.flac 999.99
...
...

Running

Run the binary:

[path to binary]/VoiceActivityDetection-CTC \
    -am [path to model] \
    -lm [path to language model] \
    -test [path to list file] \
    --lexicon [path to lexicon file] \
    --maxload -1 \
    --datadir= \
    --tokensdir [path to directory containing tokens file] \
    --tokens [tokens file name] \
    --outpath [output directory]

The script outputs four files named by each input sample ID in the directory specified by outpath:

A .vad file containing chunk-level probabilities of non-speech based on the probability of silence. These are assigned for each chunk of output; for a model trained with a stride of 1, these will be each frame (10 ms), but for a model with a stride of 8, these will be (80 ms) chunks.
An .sts file containing the perplexity the predicted sequence based on a specified input in addition to the percentage of the audio containing speech based on the passed --vadthreshold.
A .tsc file containing the most likely token-level transcription of given audio based on the acoustic model output only.
A .fwt file containing frame or chunk-level token emissions based on the most-likely token emitted for each sample.

Acoustic Models for Audio Analysis

Below are models compatible with the below audio analysis pipelines.

File	Dataset	Dev Set	Criterion	Architecture	Lexicon	Tokens
baseline_dev-other	LibriSpeech	dev-other	CTC	Archfile	Lexicon	Tokens

StreamingTDSModelConverter.cpp

Streaming TDS model conversion for running inference pipeline

Once a model is trained in wav2letter++ for streaming TDS models using the provided recipe possibly customized to suit ones' use-case, the model needs to be serialized to a format which wav2letter@anywhere inference platform can load. StreamingTDSModelConverter can be used to do this. Note that the script only supports models trained using the streaming TDS + CTC style architectures as described in the paper here.

Using the Pipeline

Build the tool with make streaming_tds_model_converter. And to run the binary:

[path to binary]/streaming_tds_model_converter \
    -am [path to model] \
    --outdir [output directory]

The output directory will contain

tokens.txt - Tokens file (with blank symbol included)
acoustic_model.bin - Serialized acoutic model
feature_extractor.bin - Serialized feature extraction model which perform log-mel feature extraction and local normalization

These files can be used to run inference on audio files along with a few other files required for decoding like language model, lexicon etc. See the tutorial for more details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tools

tools

README.md

Tools

Voice Activity Detection with CTC + an n-gram Language Model

Using the Pipeline

Input List File

Running

Acoustic Models for Audio Analysis

Streaming TDS model conversion for running inference pipeline

Using the Pipeline

Files

tools

Directory actions

More options

Directory actions

More options

Latest commit

History

tools

Folders and files

parent directory

README.md

Tools

Voice Activity Detection with CTC + an n-gram Language Model

Using the Pipeline

Input List File

Running

Acoustic Models for Audio Analysis

Streaming TDS model conversion for running inference pipeline

Using the Pipeline