---
title: "Transcribe in Real-time"
slug: "sr-transcribe-in-real-time"
description: "Stream audio and receive real-time transcription with Knovvu SR's WebSocket API, featuring live updates, VAD configuration, and customizable recognition settings."
updated: 2026-05-15T11:38:38Z
published: 2026-05-15T11:38:38Z
canonical: "docs.knovvu.com/sr-transcribe-in-real-time"
---

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.knovvu.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Transcribe in Real-time

The Speech Recognition WebSocket API provides real-time speech-to-text transcription by allowing clients to stream audio continuously and receive recognition results during the session.

This API is typically used for real-time voice scenarios where audio is sent in small chunks over a WebSocket connection.

---

## 1. WebSocket Endpoint

The Speech Recognition WebSocket endpoint may differ depending on the customer environment, region, or deployment type.

Use the WebSocket host provided for your environment.

### Endpoint Format

```
wss://<sr-websocket-host>/recognizer?ModelName=<ModelName>
```

### Example

```
wss://srapi.knovvu.com/recognizer?ModelName=EnglishStream
```

In this example, `srapi.knovvu.com` represents the Speech Recognition WebSocket host for a specific environment.

The actual host may be different for other regions, private cloud environments, or on-premises deployments.

The `ModelName` query parameter is required. It is used during WebSocket routing to direct the request to the correct speech recognition model.

The value must match the model that will be used in the recognition request.

For example, if the client wants to use the `EnglishStream` model, the WebSocket connection URL should be:

```
wss://<sr-websocket-host>/recognizer?ModelName=EnglishStream
```

And the recognition payload should also include:

```
"model-name": "EnglishStream"
```

---

## 2. Authentication

To use the service, the client must authenticate with a valid LDM token.

The token should be provided according to the agreed authentication method for the project or tenant.

### Example Placeholder

```
"Authorization": "<token>"
```

Replace `&lt;token&gt;` with the actual authorization token value.

---

## 3. Recognition Flow

A typical WebSocket recognition flow is:

1. Obtain a valid authorization token.
2. Connect to the WebSocket endpoint with the correct host and the required `ModelName` query parameter.
3. Send a `recognize` message to start recognition.
4. Stream raw audio chunks to the server.
5. Receive partial, milestone, or final recognition results.
6. Send `finalize-recognition` or `stop-recognition` when needed.

---

## 4. Client-to-Server Messages

Client-to-server messages are sent from the client application to the Speech Recognition WebSocket API.

---

### 4.1 recognize

The `recognize` message starts a speech recognition session.

#### Payload Fields

| Field | Type | Required | Description |
| --- | --- | --- | --- |
| `message-name` | string | Yes | Must be `recognize`. |
| `audio-format` | string | Yes | Audio format. Supported value: `pcm`. |
| `sample-rate` | integer | Yes | Sample rate of the audio, for example `8000` or `16000`. |
| `model-name` | string | Yes | Name of the speech recognition model. This should match the `ModelName` value in the WebSocket URL. |
| `model-tenant` | string | No | Tenant name of the selected model. If omitted, the default tenant may be used. |
| `model-version` | string | No | Version of the selected model. If omitted, the default version may be used. |
| `audio-splitter` | string | No | Audio splitting strategy. Common value: `realtime-vad`. |
| `vad-sensitivity` | integer | No | Voice Activity Detection sensitivity. Range: `1-10`. Default: `6`. |
| `vad-pre-speech-buffer-msec` | integer | No | Amount of audio kept before detected speech starts, in milliseconds. Default: `300`. |
| `vad-post-speech-buffer-msec` | integer | No | Amount of audio kept after detected speech ends, in milliseconds. Default: `400`. |
| `vad-max-speech-duration-msec` | integer | No | Maximum speech duration in milliseconds. `-1` means no limit. Default: `-1`. |
| `vad-silence-trigger-msec` | integer | No | Silence duration used to trigger speech end detection. Default: `400`. |
| `vad-graceful-silence-threshold-msec` | integer | No | Graceful silence threshold in milliseconds. Default: `10000`. |

#### Example: recognize Request

WebSocket URL:

```
wss://<sr-websocket-host>/recognizer?ModelName=EnglishStream
```

Payload:

```
{
  "message-name": "recognize",
  "audio-format": "pcm",
  "sample-rate": 16000,
  "model-name": "EnglishStream",
  "model-version": "1",
  "audio-splitter": "realtime-vad",
  "Authorization": "<token>"
}
```

#### Example: recognize Request with VAD Parameters

```
{
  "message-name": "recognize",
  "audio-format": "pcm",
  "sample-rate": 16000,
  "model-name": "EnglishStream",
  "model-tenant": "Default",
  "model-version": "1",
  "audio-splitter": "realtime-vad",
  "vad-sensitivity": 6,
  "vad-pre-speech-buffer-msec": 300,
  "vad-post-speech-buffer-msec": 400,
  "vad-max-speech-duration-msec": -1,
  "vad-silence-trigger-msec": 400,
  "vad-graceful-silence-threshold-msec": 10000,
  "Authorization": "<token>"
}
```

#### Important Note About Model Routing

The model name must be provided in the WebSocket URL as a query parameter.

Correct:

```
wss://<sr-websocket-host>/recognizer?ModelName=EnglishStream
```

Incorrect:

```
wss://<sr-websocket-host>/recognizer
```

If the `ModelName` query parameter is missing or does not match an available model, the server may not route the WebSocket session to the correct model and may return an error such as:

```
{
  "operation-result": "Cannot find model Default_EnglishStream_1",
  "session-id": "93af793b5e564e4e"
}
```

To avoid this issue:

- Use the correct WebSocket host for your environment.
- Always include `ModelName=&lt;ModelName&gt;` in the WebSocket URL.
- Make sure the `ModelName` in the URL matches the `model-name` in the `recognize` payload.
- Make sure the selected model, tenant, and version are available for the customer environment.

---

### 4.2 stop-recognition

The `stop-recognition` message stops an ongoing recognition session.

This message may discard unprocessed audio data or recognition events that have not yet been received by the client. Some events that were already generated by the server before the stop request may still be delivered.

#### Payload

```
{
  "message-name": "stop-recognition"
}
```

---

### 4.3 finalize-recognition

The `finalize-recognition` message asks the server to finalize the current recognition session and return the final result.

This is typically used when the client has finished sending audio and wants to complete the recognition process gracefully.

#### Payload

```
{
  "message-name": "finalize-recognition"
}
```

---

## 5. Audio Streaming

After sending the `recognize` message, the client can start sending audio chunks over the WebSocket connection.

The audio must match the configuration provided in the `recognize` payload.

For example, if the payload contains:

```
{
  "audio-format": "pcm",
  "sample-rate": 16000
}
```

Then the streamed audio should be:

- PCM audio
- 16-bit signed samples
- 16 kHz sample rate
- Sent in binary audio chunks

---

## 6. Server-to-Client Messages

Server-to-client messages are sent by the Speech Recognition WebSocket API to the client application.

---

### 6.1 recognize-response

Indicates whether the recognition session has started successfully.

#### Example

```
{
  "message-name": "recognize-response",
  "operation-result": "Success",
  "recognition-id": "12345"
}
```

---

### 6.2 partial-result

Provides an interim recognition result.

Partial results are not final and may change as more audio is processed.

#### Example

```
{
  "message-name": "partial-result",
  "recognition-id": "12345",
  "text": "This is a partial result."
}
```

---

### 6.3 milestone-result

Provides a stable recognition result for a completed speech segment.

Milestone results are cumulative. To obtain the full recognized text, concatenate milestone results in the order they are received.

#### Example

```
{
  "message-name": "milestone-result",
  "recognition-id": "12345",
  "text": "This is a milestone result."
}
```

---

### 6.4 final-result

Provides the final recognition result.

#### Example

```
{
  "message-name": "final-result",
  "recognition-id": "12345",
  "operation-result": "Success",
  "text": "This is the final result.",
  "confidence": "0.95"
}
```

---

### 6.5 stop-recognition-response

Indicates whether the recognition session was stopped successfully.

#### Example

```
{
  "message-name": "stop-recognition-response",
  "operation-result": "Success",
  "recognition-id": "12345"
}
```

---

### 6.6 finalize-recognition-response

Indicates whether finalization of the recognition session has started successfully.

#### Example

```
{
  "message-name": "finalize-recognition-response",
  "operation-result": "Success",
  "recognition-id": "12345"
}
```

---

## 7. Troubleshooting

### 7.1 Error: Cannot find model

#### Example Error

```
{
  "operation-result": "Cannot find model Default_EnglishStream_1",
  "session-id": "93af793b5e564e4e"
}
```

#### Possible Causes

1. The WebSocket URL does not include the required `ModelName` query parameter.
2. The `ModelName` value in the URL does not match the `model-name` value in the payload.
3. The selected WebSocket host does not belong to the customer’s assigned environment.
4. The selected model is not available for the tenant.
5. The selected model version is not available.
6. The tenant value is missing or incorrect.

#### Recommended Check

WebSocket URL:

```
wss://<sr-websocket-host>/recognizer?ModelName=EnglishStream
```

Payload:

```
{
  "message-name": "recognize",
  "audio-format": "pcm",
  "sample-rate": 16000,
  "model-name": "EnglishStream",
  "model-version": "1",
  "audio-splitter": "realtime-vad",
  "Authorization": "<token>"
}
```

---

## 8. Best Practices

- Use the correct WebSocket host for your environment.
- Always include `ModelName` in the WebSocket URL.
- Keep the URL `ModelName` and payload `model-name` consistent.
- Use a valid authorization token.
- Make sure the audio sample rate matches the `sample-rate` value in the payload.
- Send audio in the expected format, such as PCM 16-bit.
- Use `finalize-recognition` when all audio has been sent and a final result is expected.
- Use `stop-recognition` only when the recognition session should be interrupted or discarded.
