- Print
- PDF
WebSocket integration provides a bidirectional communication channel between the client and the Sestek SR system. It enables real-time, full-duplex communication, allowing both the client and server to send data asynchronously. The WebSocket protocol is designed to overcome the limitations of traditional HTTP by establishing a persistent connection that facilitates instant data exchange.
With WebSocket integration, the client can continuously stream audio data to the SR system, which can provide real-time transcriptions or responses. WebSocket integration is particularly useful for applications that require continuous speech recognition or interactive voice capabilities.
Availability: Available in both on-premises and cloud solutions.
A Simple WebSocket Node Application
This sample application has been prepared with W3CWebSocket client in nodejs, so you can use most of it on your browser.
Realtime audio recording has not been implemented in this example to keep it simple. (sample audio here )
Before we start working, we make sure to install websocket in our node with either of these commands: yarn add websocket
or npm install websocket
.
Then we require() the necessary libraries and create a websocket client object:
var W3CWebSocket = require('websocket').w3cwebsocket;
var fs = require('fs');
var client = new W3CWebSocket("[server-address]");
Notice: replace [server-address] with the address that has been provided to you. It should be of the form wss://someserveradress/recognizer
Log any problems for the sake of simplicity:
client.onerror = function() {
console.log('Connection Error');
};
client.onclose = function() {
console.log('Client Closed');
};
We quickly start a recognition right after we establish the connection to SestekSR service.
client.onopen = function() {
console.log('WebSocket Client Connected');
function startRecognition() {
if (client.readyState === client.OPEN) {
client.send(JSON.stringify({
"message-name": "recognize",
"audio-format": "pcm",
"sample-rate": 8000,
"model-name": "SestekSystemTestModel"
})
);
}
}
startRecognition();
};
The flow will continue in an event based fashion. Now we add handlers to the incoming messages inside onmessage event of the websocket client.
client.onmessage = function(e)
All messages are json formatted strings. “message-name” property exists in all incoming and outgoing messages, so we switch on “message-name”
if (typeof e.data === 'string') {
var payload = JSON.parse(e.data);
if(payload["message-name"] === "...")
{
//...
}
If a message has been received from the server as a response to your requests it has the “-response” postfix. We have send “recognize” message and we expect a “recognize-response” message.
if(payload["message-name"] === "recognize-response")
{
console.log('reading pcm');
audioData = fs.readFileSync('merhaba.pcm');
//in a realtime recorded audio use case use we would send this audio data little by little
console.log('sending all pcm at once');
client.send(audioData);
console.log('sent all pcm');
}
For simplicity, here we have sent all the audio file content at once. In a real-time audio recording application this will be the point you would like to customize, basically you will call client.send(audioData)
;many times with smaller pieces of audio data.
Let’s log the partial and milestone results.
else if(payload["message-name"] === "partial-result")
{
console.log('received partial result : ' + payload["partial-result-text"]);
}
else if(payload["message-name"] === "milestone-result")
{
console.log('received milestone result : ' + payload["milestone-result-text"]);
client.send(JSON.stringify({"message-name": "finalize-recognition"}));
}
At this point it you may want to check the API document for the difference between partial-result and milestone-result. “partial-result” is basically the last few seconds of recognized text which is prone to frequent change, “milestone-result” is the earlier parts of the recognized text which the decoder has decided never to change again.
Note that sending “finalize-recognition” in the “milestone-result” message handler is an over-simplification. Since this is a basic example and we are only recognizing a short audio file we cut the corners and finalize the recognition the moment we get a milestone-result.
First we get the response for “finalize-recognition” request. Note that this response does not contain the final result itself, preparing a “final-result” may take a little more time.
else if(payload["message-name"] === "finalize-recognition-response")
{
if(payload["operation-result"] === "success")
{
console.log('finalize-recognition-response succeeded');
}
else
{
console.log('finalize-recognition-response failed : ' + payload["operation-result"]);
}
}
If the “finalize-recognition-response” is successful we can wait for the “final-result” event. This event contains the final result text that we desire.
else if(payload["message-name"] === "final-result")
{
console.log('received final result : ' + payload["text"]);
client.close();
}