Synthesize Text via WebSocket

1. Purpose

This document explains how to use the TTS service over WebSocket. Through the WebSocket connection, text can be sent to the service, synthesized into audio, and the synthesis flow can be completed by sending the required control messages.

2. WebSocket Connection

A WebSocket request can be created in Postman.

WebSocket URL Format:

wss://<tts-service-host>/synthesizer

Example:

wss://<environment-specific-tts-host>/synthesizer

Note: Actual environment URLs should not be shared openly in documentation. The relevant TTS WebSocket URL should be used depending on the target environment.

3. Message Flow

After the WebSocket connection is established, the messages should be sent in the following order:

1. synthesize
2. add-tts-text
3. flush
4. stop
5. finalize-synthesis

Note: The separate flush message is required only when the add-tts-text message does not include the optional flush parameter.

4. Start Synthesis Message

The first message should be the synthesize message. This message initializes the synthesis process and includes the audio format, voice, sample rate, volume, speaking rate, and authorization information.

Request Format

{
  "message-name": "synthesize",
  "audio-format": "pcm",
  "voice-name": "<voice-name>",
  "sample-rate": "<sample-rate>",
  "volume": "<volume>",
  "rate": "<rate>",
  "Authorization": "<access-token>"
}

Example

{
  "message-name": "synthesize",
  "audio-format": "pcm",
  "voice-name": "Emily_Premium",
  "sample-rate": "24000",
  "volume": "1.0",
  "rate": "1.0",
  "Authorization": "<access-token>"
}

Parameter Descriptions

Parameter	Description	Example Value
`message-name`	Specifies the message type.	`synthesize`
`audio-format`	Specifies the output audio format.	`pcm`
`voice-name`	Specifies the TTS voice to be used.	`Emily_Premium`
`sample-rate`	Specifies the audio sample rate.	`24000`
`volume`	Specifies the output volume level.	`1.0`
`rate`	Specifies the speaking rate.	`1.0`
`Authorization`	Access token used for authorization.	`<access-token>`

Security note: The actual access token should not be shared openly in documentation, emails, tickets, or any shared environment.

5. Text Message

After the synthesis process is initialized, the text to be synthesized should be sent using the add-tts-text message.

Request Format

{
  "message-name": "add-tts-text",
  "text": "<text-to-synthesize>"
}

Example

{
  "message-name": "add-tts-text",
  "text": "Hello world. How are you today?"
}

Parameter Descriptions

Parameter	Description	Example Value
`message-name`	Specifies the message type.	`add-tts-text`
`text`	Contains the text to be synthesized.	`Hello world. How are you today?`
`flush`	Optional parameter. If set to `true`, `yes`, or `1`, the text message is flushed immediately without sending a separate `flush` message.	`true`

Inline Flush Usage

The add-tts-text message also supports an optional flush parameter. When this parameter is set to a truthy value such as true, yes, or 1, the text message is flushed immediately. In this case, a separate flush message is not required for that specific text input.

Example

{
  "message-name": "add-tts-text",
  "text": "Hello world. How are you today?",
  "flush": "true"
}

If the flush parameter is not provided, a separate flush message should be sent after the text message to trigger synthesis.

6. Flush Message

After the text is sent, the flush message should be sent. This message indicates that the current text input is complete and should be processed by the service.

A separate flush message is required only when the add-tts-text message does not include the optional flush parameter.

Request Format

{
  "message-name": "flush"
}

7. Stop Message

The stop message is used to stop the synthesis process.

Request Format

{
  "message-name": "stop"
}

8. Finalize Synthesis Message

The finalize-synthesis message is used to finalize and close the synthesis flow.

Request Format

{
  "message-name": "finalize-synthesis"
}

9. Sample Message Sequence

The following example shows the message sequence to be sent after the WebSocket connection is established.

Option 1: Text Message Followed by Separate Flush

1. Start Message

{
  "message-name": "synthesize",
  "audio-format": "pcm",
  "voice-name": "Emily_Premium",
  "sample-rate": "24000",
  "volume": "1.0",
  "rate": "1.0",
  "Authorization": "<access-token>"
}

2. Text Message

{
  "message-name": "add-tts-text",
  "text": "Hello world. How are you today?"
}

3. Flush Message

{
  "message-name": "flush"
}

4. Stop Message

{
  "message-name": "stop"
}

5. Finalize Message

{
  "message-name": "finalize-synthesis"
}

Option 2: Text Message with Inline Flush

1. Start Message

{
  "message-name": "synthesize",
  "audio-format": "pcm",
  "voice-name": "Emily_Premium",
  "sample-rate": "24000",
  "volume": "1.0",
  "rate": "1.0",
  "Authorization": "<access-token>"
}

2. Text Message with Inline Flush

{
  "message-name": "add-tts-text",
  "text": "Hello world. How are you today?",
  "flush": "true"
}

3. Stop Message

{
  "message-name": "stop"
}

4. Finalize Message

{
  "message-name": "finalize-synthesis"
}

10. Testing via Postman

Create a new WebSocket Request in Postman.
Enter the WebSocket URL using the following format:

wss://<tts-service-host>/synthesizer

Connect to the WebSocket endpoint.
Send the synthesize message first.
Send the text to be synthesized using the add-tts-text message.
Send the flush message to trigger processing of the provided text.

This step can be skipped if the add-tts-text message includes the optional flush parameter.
Send the stop message if the synthesis process needs to be stopped.
Send the finalize-synthesis message to complete the synthesis flow.

11. Important Notes

Messages should only be sent after the WebSocket connection is successfully established.
The first message must be synthesize.
A valid access token must be provided in the Authorization field.
Access tokens should never be shared openly in documentation.
Environment-specific endpoint information should be replaced with placeholders such as <tts-service-host>.
The voice-name value must be one of the voices supported in the relevant environment.
sample-rate, audio-format, volume, and rate values must be compatible with the values supported by the service.
If the optional flush parameter is used in the add-tts-text message, a separate flush message is not required for that text input.
If the optional flush parameter is not used, the flush message should be sent separately after the text message.