Transcribe in Real-time
  • 29 May 2024
  • 3 Minutes to read
  • Contributors
  • PDF

Transcribe in Real-time

  • PDF

Article summary

The WebSocket API provides a seamless way to perform speech recognition with streaming audio. This API allows you to receive real-time updates on transcribed text while continuously streaming audio to the speech servers.

Table of Contents


Servers

Public Server

  • URL: wss://srapi.knovvu.com
  • Protocol: wss
  • Usage: To activate the service, use an LDM token.

Operations

PUB / Operation

Operation ID: processReceivedMessage

  • Description: Send messages to the API, such as starting recognition with the server. It accepts one of the following messages:
  1. Message recognize: Initiate a speech recognition request with the server. This message is used when the client wants to start a recognition process.

    • Payload:

      • message-name: (string, constant: "recognize")
      • audio-format: (string) Audio format, allowed value: "pcm" (16-bit signed samples).
      • model-name: (string) The name of the selected speech recognition model, e.g., "ENGLISH_GENERAL".
      • model-tenant: (string) The tenant name of the selected model (default: "Default").
      • model-version: (string) The version of the selected model (default: empty).
      • sample-rate: (integer) Sample rate of the model and audio data (e.g., 8000, 16000).
      • audio-splitter: (string) Type of audio splitting to be used, default: "realtime-vad" (allowed: "realtime-vad", "audio-segmenter").
      • vad-sensitivity: (integer) Voice Activity Detection (VAD) sensitivity (range: 1-10, default: 6).
      • vad-pre-speech-buffer-msec: (integer) Pre-speech buffer duration in milliseconds (default: 300).
      • vad-post-speech-buffer-msec: (integer) Post-speech buffer duration in milliseconds (default: 400).
      • vad-max-speech-duration-msec: (integer) Maximum speech duration in milliseconds (-1 means no limit, default: -1).
      • vad-silence-trigger-msec: (integer) Silence trigger duration in milliseconds (default: 400).
      • vad-graceful-silence-threshold-msec: (integer) Graceful silence threshold in milliseconds (default: 10000).
    • Example Payload:

      {
        "message-name": "recognize",
        "audio-format": "pcm",
        "sample-rate": 8000,
        "model-name": "ENGLISH_GENERAL"
      }
      
  2. Message stop-recognition: Request to stop an ongoing recognition. This message discards any unprocessed audio data or unreceived recognition events. Note that some events generated by the server before receiving this stop message may still be received.

    • Payload:

      • message-name: (string, constant: "stop-recognition")
    • Example Payload:

      {
        "message-name": "stop-recognition"
      }
      
  3. Message finalize-recognition: Request to finalize the recognition process. This message is used to specify the channel ID or multiple currency pairs.

    • Payload:

      • message-name: (string, constant: "finalize-recognition")
    • Example Payload:

      {
        "message-name": "finalize-recognition"
      }
      

These message definitions streamline communication with the API for speech recognition operations, providing flexibility and control over the recognition process.

SUB / Operation

Operation ID: sendMessage

  • Description: This operation is for receiving messages from the API. It accepts one of the following messages:
  1. recognize-response: This message serves as a response to the recognition request, indicating whether the recognition process has successfully started or not.

    • Payload:

      • message-name: (string, constant: "recognize-response")
      • operation-result: (string) Indicates success or provides an error message.
      • recognition-id: (string) The ID of the ongoing recognition process.
    • Example Payload:

      {
        "message-name": "recognize-response",
        "operation-result": "Success",
        "recognition-id": "12345"
      }
      
  2. stop-recognition-response: This message is a response to the stop-recognition request, indicating whether the current recognition process has been successfully stopped.

    • Payload:

      • message-name: (string, constant: "stop-recognition-response")
      • operation-result: (string) Indicates success or provides an error message.
      • recognition-id: (string) The ID of the stopped recognition process, if applicable.
    • Example Payload:

      {
        "message-name": "stop-recognition-response",
        "operation-result": "Success",
        "recognition-id": "12345"
      }
      
  3. finalize-recognition-response: This message serves as a response to the finalize-recognition request, indicating whether the finalization of the current recognition process has successfully started.

    • Payload:

      • message-name: (string, constant: "finalize-recognition-response")
      • operation-result: (string) Indicates success or provides an error message.
      • recognition-id: (string) The ID of the finalized recognition process, if applicable.
    • Example Payload:

      {
        "message-name": "finalize-recognition-response",
        "operation-result": "Success",
        "recognition-id": "12345"
      }
      
  4. partial-result: This message provides a partial result of the recognition process. Partial results contain recognition text produced since the last milestone or since the beginning if no milestone results are available yet. Note that partial results are not definitive and can change over time.

    • Payload:

      • message-name: (string, constant: "partial-result")
      • recognition-id: (string) The ID of the recognition process, if applicable.
      • text: (string) The partial result text.
    • Example Payload:

      {
        "message-name": "partial-result",
        "recognition-id": "12345",
        "text": "This is a partial result."
      }
      
  5. milestone-result: This message provides a milestone result of the recognition process. Milestone results are cumulative, and you should concatenate all milestone results from the start to obtain a full recognition result.

    • Payload:

      • message-name: (string, constant: "milestone-result")
      • recognition-id: (string) The ID of the recognition process, if applicable.
      • text: (string) The milestone result text.
    • Example Payload:

      {
        "message-name": "milestone-result",
        "recognition-id": "12345",
        "text": "This is a milestone result."
      }
      
  6. final-result: This message provides the final result of the recognition process.

    • Payload:

      • message-name: (string, constant: "final-result")
      • recognition-id: (string) The ID of the recognition process, if applicable.
      • operation-result: (string) Indicates success or provides an error message.
      • text: (string) The final result text.
      • confidence: (string) Confidence of the recognition, represented as a floating-point value between 0.0 and 1.0, with 1.0 being the highest confidence.
    • Example Payload:

      {
        "message-name": "final-result",
        "recognition-id": "12345",
        "operation-result": "Success",
        "text": "This is the final result.",
        "confidence": "0.95"
      }
      

These message definitions facilitate communication and tracking of recognition operations within the API.


Was this article helpful?

What's Next
Changing your password will log you out immediately. Use the new password to log back in.
First name must have atleast 2 characters. Numbers and special characters are not allowed.
Last name must have atleast 1 characters. Numbers and special characters are not allowed.
Enter a valid email
Enter a valid password
Your profile has been successfully updated.