Audio Splitters
  • 30 Apr 2025
  • 4 Minutes to read
  • Contributors
  • PDF

Audio Splitters

  • PDF

Article summary

Audio Splitting Nodes Comparison

Feature
VAD
AudioSegmenter
StreamingSupported.NOT supported
BatchSupported. Works stream-like.Supported. Preferred.
General Project Structure

In the project structure, VAD's and Audio Segmenter's connection architecture is the same. However, if the project is to be run only in Batch mode with HTTP requests, Audio Segmenter performs significantly better.

Important Note

Make sure that both the Audio output and the Event output is connected to the receiving nodes. Otherwise, the following nodes will not know when the speech starts and ends.



Audio Segmenter

Splits the whole incoming audio data to several audio fragments at once. Each fragment contain at least some audible data. Works on batch audio.

Parameters

none

Inputs

Audio

Accepts audio from a single channel.

Events

none

Outputs

Audio

Audio fragments are sent to output

Events

namedescription
Start of SegmentRaised once at the beginning of each audio fragment.
End of SegmentRaised once at the end of each audio fragment.

Remarks

This node toasts the audio output write actions for each segment in between start-of-segment and end-of-segment events. As an example for an audio data with 3 audio segments, the flow is as follows (the order is well defined):

segment-1: send "Start of Segment" event
segment-1: write the audio data of this segment to output
segment-1: send "End of Segment" event
segment-2: send "Start of Segment" event
segment-2: write the audio data of this segment to output
segment-2: send "End of Segment" event
segment-3: send "Start of Segment" event
segment-3: write the audio data of this segment to output
segment-3: send "End of Segment" event

Project Structure

A minimal project with Audio Segmenter can be built as such:

image.png

Important Note

Make sure that both the Audio output and the Event output is connected to the receiving nodes. Otherwise, the following nodes will not know when the audio starts and ends.

Supported flow types

Batch

Release History

v1.0.0
  • Introduced Node.


VAD

Performs voice activity detection. Works on streaming data. Filters out silences in the provided audio.

Parameters

name
description
default
SensitivityThe range is 0.0-1.0 inclusive, 1.0 being the most sensitive. If you use 1.0 even the smallest voices will be heard and taken into account when VAD decides which part of the received data is actually speech.1.0
MaxSpeechDurationMsecIf the speech does not end after this duration it will be ended by VAD. This will be treated as a normal end of speech, and an appropriate speech-ended event will be generated. Exceeding this timeout will not generated an error.-1
PreSpeechBufferMsecAfter start of speech is detected VAD rewinds and takes a little more data before the detected beginning, just in case a low energy voice happens to be there. This duration is determined pre-speech-buffer-msec300
PostSpeechBufferMsecAfter the end of speech is detected VAD takes a little more data after the detected end just in case a low energy voice happens to be there. This duration is determined by post-speech-buffer-msec.300
SilenceTriggerMsecThe mount of silence in milliseconds for VAD to expect in order to decide that the speech has actually ended400

Inputs

Audio

Accepts audio from a single channel. Passing the audio through a VAD node before streaming to this node is recommended.

Events

none

Outputs

Audio

After removing the silences in the input audio the remaining data is sent to output.

Events

namedescription
Speech StartedRaised once at the beginning of each piece of actual audio fragment.
Speech EndedRaised once at the end of each piece of actual audio fragment.

Remarks

Project Structure

A simple project can be built as such:

image.png

Important Note

Make sure that both the Audio output and the Event output is connected to the receiving nodes. Otherwise, the following nodes will not know when the speech starts and ends.

Supported flow types

Stream, Batch

Release History

v3.7.0
  • Added full flush support.
v3.3.0
  • Fixed a crash that happened when VAD is fed an unsupported sample rate.
  • Parameters are now validated before the session starts.
v1.0.0
  • Introduced Node.


Vad Silero

Parameters

name
description
default
SensitivityThe range is 0.0-1.0 inclusive, Determines the threshold of Speech Possibility. Speech Started event triggers when Speech Possibility > Sensitivity.0.9
MaxSpeechDurationMsecIf the speech does not end after this duration it will be ended by VAD. This will be treated as a normal end of speech, and an appropriate speech-ended event will be generated. Exceeding this timeout will not generate an error.-1
PreSpeechBufferMsecAfter start of speech is detected VAD rewinds and takes a little more data before the detected beginning, just in case a low energy voice happens to be there. This duration is determined by pre-speech-buffer-msec300
PostSpeechBufferMsecAfter the end of speech is detected VAD takes a little more data after the detected end just in case a low energy voice happens to be there. This duration is determined by post-speech-buffer-msec.300
SilenceTriggerMsecThe mount of silence in milliseconds for VAD to expect in order to decide that the speech has actually ended400

Inputs

Audio

Accepts audio from a single channel. Passing the audio through a VAD node before streaming to this node is recommended.

Events

none

Outputs

Audio

After removing the silences in the input audio the remaining data is sent to output.

Events

namedescription
Speech StartedRaised once at the beginning of each piece of actual audio fragment.
Speech EndedRaised once at the end of each piece of actual audio fragment.

Remarks

Project Structure

All the project structures mentioned above can be utilized.

Supported Flow Types

Batch, Stream

Release History

v3.7.0
  • Now Supports 16khz audio streams.
v3.6.0
  • Introduced Node.


Was this article helpful?

Changing your password will log you out immediately. Use the new password to log back in.
First name must have atleast 2 characters. Numbers and special characters are not allowed.
Last name must have atleast 1 characters. Numbers and special characters are not allowed.
Enter a valid email
Enter a valid password
Your profile has been successfully updated.