VAD Silero

Prev Next

Parameters

name
description
default
Sensitivity The range is 0.0-1.0 inclusive, Determines the threshold of Speech Possibility. Speech Started event triggers when Speech Possibility > Sensitivity. 0.9
MaxSpeechDurationMsec If the speech does not end after this duration it will be ended by VAD. This will be treated as a normal end of speech, and an appropriate speech-ended event will be generated. Exceeding this timeout will not generate an error. -1
PreSpeechBufferMsec After start of speech is detected VAD rewinds and takes a little more data before the detected beginning, just in case a low energy voice happens to be there. This duration is determined by pre-speech-buffer-msec 300
PostSpeechBufferMsec After the end of speech is detected VAD takes a little more data after the detected end just in case a low energy voice happens to be there. This duration is determined by post-speech-buffer-msec. 300
SilenceTriggerMsec The amount of silence in milliseconds for VAD to expect in order to decide that the speech has actually ended 400

Inputs

Audio

 Accepts audio from a single channel. Passing the audio through a VAD node before streaming to this node is recommended.

Events

 none

Outputs

Audio

  After removing the silences in the input audio the remaining data is sent to output.

Events

name description
Speech Started Raised once at the beginning of each piece of actual audio fragment.
Speech Ended Raised once at the end of each piece of actual audio fragment.

Remarks

Supported Flow Types

Batch, Stream

Release History

v3.7.0
  • Now Supports 16khz audio streams.
v3.6.0
  • Introduced Node.