WebSocket

Updated on 24 Sep 2024
3 Minutes to read
Contributors

Article summary

Did you find this summary helpful?

Thank you for your feedback!

WebSocket integration provides a bidirectional communication channel between the client and the Sestek SR system. It enables real-time, full-duplex communication, allowing both the client and server to send data asynchronously. The WebSocket protocol is designed to overcome the limitations of traditional HTTP by establishing a persistent connection that facilitates instant data exchange.

With WebSocket integration, the client can continuously stream audio data to the SR system, which can provide real-time transcriptions or responses. WebSocket integration is particularly useful for applications that require continuous speech recognition or interactive voice capabilities.

Availability: Available in both on-premises and cloud solutions.

A Simple WebSocket Node Application

This sample application has been prepared with W3CWebSocket client in nodejs, so you can use most of it on your browser.

Realtime audio recording has not been implemented in this example to keep it simple. (sample audio here )

Before we start working, we make sure to install websocket in our node with either of these commands: yarn add websocket or npm install websocket.

Then we require() the necessary libraries and create a websocket client object:

var W3CWebSocket = require('websocket').w3cwebsocket;
var fs = require('fs');
var client = new W3CWebSocket("[server-address]");

Notice: replace [server-address] with the address that has been provided to you. It should be of the form wss://someserveradress/recognizer

Log any problems for the sake of simplicity:

client.onerror = function() {
  console.log('Connection Error');
};
client.onclose = function() {
  console.log('Client Closed');
};

We quickly start a recognition right after we establish the connection to SestekSR service.

client.onopen = function() {
  console.log('WebSocket Client Connected');
  function  startRecognition() {
    if (client.readyState === client.OPEN) {
      client.send(JSON.stringify({
        "message-name":  "recognize",
        "audio-format":  "pcm",
        "sample-rate":  8000,
        "model-name":  "SestekSystemTestModel"
        })
      );
    }
  }
  startRecognition();
};

The flow will continue in an event based fashion. Now we add handlers to the incoming messages inside onmessage event of the websocket client.
client.onmessage = function(e)

All messages are json formatted strings. “message-name” property exists in all incoming and outgoing messages, so we switch on “message-name”

if (typeof  e.data === 'string') {
  var payload = JSON.parse(e.data);
  if(payload["message-name"] === "...")
  {
  //...
  }

If a message has been received from the server as a response to your requests it has the “-response” postfix. We have send “recognize” message and we expect a “recognize-response” message.

if(payload["message-name"] === "recognize-response")
{
  console.log('reading pcm');
  audioData = fs.readFileSync('merhaba.pcm');
  //in a realtime recorded audio use case use we would send this audio data little by little
  console.log('sending all pcm at once');
  client.send(audioData);
  console.log('sent all pcm');
}

For simplicity, here we have sent all the audio file content at once. In a real-time audio recording application this will be the point you would like to customize, basically you will call client.send(audioData) ;many times with smaller pieces of audio data.

Let’s log the partial and milestone results.

else  if(payload["message-name"] === "partial-result")
{
  console.log('received partial result : ' + payload["partial-result-text"]);
}
else  if(payload["message-name"] === "milestone-result")
{
  console.log('received milestone result : ' + payload["milestone-result-text"]);
  client.send(JSON.stringify({"message-name":  "finalize-recognition"}));
}

At this point it you may want to check the API document for the difference between partial-result and milestone-result. “partial-result” is basically the last few seconds of recognized text which is prone to frequent change, “milestone-result” is the earlier parts of the recognized text which the decoder has decided never to change again.

Note that sending “finalize-recognition” in the “milestone-result” message handler is an over-simplification. Since this is a basic example and we are only recognizing a short audio file we cut the corners and finalize the recognition the moment we get a milestone-result.

First we get the response for “finalize-recognition” request. Note that this response does not contain the final result itself, preparing a “final-result” may take a little more time.

else  if(payload["message-name"] === "finalize-recognition-response")
{
  if(payload["operation-result"] === "success")
  {
  console.log('finalize-recognition-response succeeded');
  }
  else
  {
  console.log('finalize-recognition-response failed : ' + payload["operation-result"]);
  }
}

If the “finalize-recognition-response” is successful we can wait for the “final-result” event. This event contains the final result text that we desire.

else  if(payload["message-name"] === "final-result")
{
  console.log('received final result : ' + payload["text"]);
  client.close();
}

Was this article helpful?

What's Next

MRCP

Table of contents