Vad Elastic
Supported Sample Rates
| Engines | 8 kHz | 16 kHz | 32 kHz | 48 kHz |
|---|---|---|---|---|
| VadLite2 | ✔️ | ✔️ | ✔️ | ✔️ |
| Silero | ✔️ | ✔️ |
Parameters
name |
description | default |
|---|---|---|
| Engine Type | Underlying VAD Engine | VadLite2 |
| Silence Trigger Duration (ms) | The amount of silence in milliseconds for VAD to expect in order to decide that the speech has actually ended | 600 |
| Early Silence Trigger Duration (ms) | The amount of silence in milliseconds For VAD to expect in order trigger Early Speech Ended. Recommended 200 msec | -1 (disabled) |
| Pre-Speech Buffer Length (ms) | After start of speech is detected VAD rewinds and takes a little more data before the detected beginning, just in case a low energy voice happens to be there. This duration is determined by pre-speech-buffer-msec | 300 |
| Post-Speech Buffer Length (ms) | After the end of speech is detected VAD takes a little more data after the detected end just in case a low energy voice happens to be there. This duration is determined by post-speech-buffer-msec. | 300 |
| Sensitivity | The range is 0.0-1.0 inclusive, Determines the threshold of Speech Possibility. Speech Started event triggers when Speech Possibility > Sensitivity. | 0.6 |
| Max Speech Duration (ms) | If the speech does not end after this duration it will be ended by VAD. This will be treated as a normal end of speech, and an appropriate speech-ended event will be generated. Exceeding this timeout will not generate an error. | -1 |
| Max Speech Duration Graceful End (%) | Specify the last % part of Max Speech Duration, engine becomes more sensitive to silence. It's active only if Max Speech Duration is set. | 20 |
Inputs
Audio
Accepts audio from a single channel.
Events
none
Outputs
Audio
After removing the silences in the input audio the remaining data is sent to output.
Events
| name | description |
|---|---|
| Speech Started | Raised once at the beginning of each piece of actual audio fragment. |
| Early Speech Ended | Raised when VAD Engine detects Early Silence. Can be raised multiple times for each utterance. |
| Speech Ended | Raised once at the end of each piece of actual audio fragment. |
Remarks
Project Structure
A simple project can be built as such:

Important Note
Make sure that both the Audio output and the Event output is connected to the receiving nodes. Otherwise, the following nodes will not know when the speech starts and ends.
Supported Flow Types
Batch, Stream

