- Print
- PDF
Audio Splitting Nodes Comparison
Feature | VAD | AudioSegmenter |
---|---|---|
Streaming | Supported. | NOT supported |
Batch | Supported. Works stream-like. | Supported. Preferred. |
In the project structure, VAD's and Audio Segmenter's connection architecture is the same. However, if the project is to be run only in Batch mode with HTTP requests, Audio Segmenter performs significantly better.
Audio Segmenter
Splits the whole incoming audio data to several audio fragments at once. Each fragment contain at least some audible data. Works on batch audio.
Parameters:
none
Inputs
Audio:
Accepts audio from a single channel.
Events:
none
Outputs
Audio:
Audio fragments are sent to output
Events:
name | description |
---|---|
Start of Segment | Raised once at the beginning of each audio fragment. |
End of Segment | Raised once at the end of each audio fragment. |
Remark: This node toasts the audio output write actions for each segment in between start-of-segment and end-of-segment events. As an example for an audio data with 3 audio segments, the flow is as follows (the order is well defined):
segment-1: send "Start of Segment" event
segment-1: write the audio data of this segment to output
segment-1: send "End of Segment" event
segment-2: send "Start of Segment" event
segment-2: write the audio data of this segment to output
segment-2: send "End of Segment" event
segment-3: send "Start of Segment" event
segment-3: write the audio data of this segment to output
segment-3: send "End of Segment" event
Project Structure
A minimal project with Audio Segmenter can be built as such:
Make sure that both the Audio
output and the Event
output is connected to the receiving nodes. Otherwise, the following nodes will not know when the audio starts and ends.
Supported flow types: Batch
VAD
Performs voice activity detection. Works on streaming data. Filters out silences in the provided audio.
Parameters:
name | description | default |
---|---|---|
Sensitivity | The range is 0.0-1.0 inclusive, 1.0 being the most sensitive. If you use 1.0 even the smallest voices will be heard and taken into account when VAD decides which part of the received data is actually speech. | 1.0 |
MaxSpeechDurationMsec | If the speech does not end after this duration it will be ended by VAD. This will be treated as a normal end of speech, and an appropriate speech-ended event will be generated. Exceeding this timeout will not generated an error. | -1 |
PreSpeechBufferMsec | After start of speech is detected VAD rewinds and takes a little more data before the detected beginning, just in case a low energy voice happens to be there. This duration is determined pre-speech-buffer-msec | 300 |
PostSpeechBufferMsec | After the end of speech is detected VAD takes a little more data after the detected end just in case a low energy voice happens to be there. This duration is determined by post-speech-buffer-msec. | 300 |
SilenceTriggerMsec | The mount of silence in milliseconds for VAD to expect in order to decide that the speech has actually ended | 400 |
Inputs
Audio:
Accepts audio from a single channel. Passing the audio through a VAD node before streaming to this node is recommended.
Events:
none
Outputs
Audio:
After removing the silences in the input audio the remaining data is sent to output.
Events:
name | description |
---|---|
Speech Started | Raised once at the beginning of each piece of actual audio fragment. |
Speech Ended | Raised once at the end of each piece of actual audio fragment. |
Project Structure
A simple project can be built as such:
Make sure that both the Audio
output and the Event
output is connected to the receiving nodes. Otherwise, the following nodes will not know when the speech starts and ends.
Supported flow types: Stream, Batch