Transcribe a File
  • 31 Jan 2025
  • 27 Minutes to read
  • Contributors
  • PDF

Transcribe a File

  • PDF

Article summary

This document outlines the usage of the Knovvu SR REST API, which provides two types of speech recognition services:

  • Speech Recognition with Grammar: Performs speech recognition using a grammar file.
  • Speech Dictation with Language Model: Uses a language model for speech dictation.

For more details on these services, please refer to the relevant documentation: Recognition Methods


Authentication

Knovvu SR service requires a bearer token when sending recognition requests and using license-required methods. Therefore, a token must be obtained before using these methods.

The required information for token generation includes:

  • API Client ID
  • API Client Secret

These credentials are provided after the subscription is created.

Steps for Token Generation

Send a POST request to the Get Integration Token LDM endpoint to create a bearer token.

Request Example (cURL)

curl --location --request POST 'https://identity.ldm.knovvu.com/connect/token' \
--header 'Content-Type: application/x-www-form-urlencoded' \
--data-urlencode 'client_id=[client_id]' \
--data-urlencode 'client_secret=[client_secret] =' \
--data-urlencode 'grant_type=client_credentials' \
--data-urlencode 'scope=Ldm_Integration'

Response Example

{
  "access_token": "[token]",
  "expires_in": 31536000,
  "token_type": "Bearer",
  "scope": "Ldm_Integration"
}

Using the API After Authentication

Once authentication is completed and a valid bearer token is obtained, you can start using the speech recognition functionalities of the API. There are two primary methods for recognizing speech, depending on your use case:

  • Speech Recognition with Grammar: Performs speech recognition using a predefined grammar file. This method is ideal for structured scenarios where the recognized speech needs to match a set of predefined words or phrases.
  • Speech Dictation with Language Model: Uses a language model to recognize free-form speech. This method is suitable for scenarios where users are expected to speak naturally without predefined constraints.

Speech Recognition with Grammar

The API enables speech recognition using a grammar file. The process involves the following steps:

1. Selecting a Grammar File

  • Multiple grammar files can be loaded onto the server.
  • For each recognition request, specify the grammar file to be used.

2.Sending the Audio File

  • Submit an audio file along with the selected grammar file to the service.

3.Receiving the Recognized Text

  • The service returns the recognized text based on the specified grammar file.

Additionally, the service includes a speech validation feature. This feature is used for short audio files to verify whether they contain the expected text. In this case, a grammar file does not need to be supplied, as the service automatically generates one for validation purposes.

Grammar Files

Grammars are used by speech recognizers and other grammar processors to define the words and patterns to be recognized. Developers can specify words and structures for speech recognition.

There are two types of grammars:

  • List Grammar: A simple grammar format developed by Sestek, similar to a word list.

For example, a file containing the following lines:

Apple
Banana

Note:
A valid grammar file in this format should only contain two elements. This format is only supported for Turkish language and its use is discouraged. However, support will continue for backward compatibility. For new projects, SRGS formatted grammar files are recommended.


SRGS XML Grammar

Srgsxml is a W3C specification, providing SRGS XML format support with SISR and NLSML.

A basic SRGS grammar, which defines a list of words to be recognized, looks like this:

<?xml version="1.0" encoding="UTF-8" ?>
<grammar mode="voice" tag-format="semantics/1.0" xml:lang="en-US" version="1.0" root="main">
    <rule id="main">
        <one-of>
            <item>Apple</item>
            <item>Banana</item>
        </one-of>
    </rule>
</grammar>

Typical Usage Scenarios

The following diagram illustrates the typical usage scenarios for the SR REST API:

Client                                SR REST API
  +-----------------------------------+
  |   +--------+ GET  +--------+     |
  |   | List Available Grammars |     |
  |   +--------+ GET  +--------+     |
  |   | Create New Grammar      |     |
  |   +--------+ POST +--------+     |
  |   | Make Speech Recognition |     |
  +-----------------------------------+

List Available Grammars

Actor: REST Service Client
Goal: Retrieve available grammars from the speech recognition service.

StepREST Service Client ActionsREST Service Actions
1Send GET request to the Grammars endpoint.Returns information about all available grammars in JSON format.

Post Conditions

  • None

Business Rules

  • No license is required to call this service.

Create New Grammar

Actor: REST Service Client
Goal: Define a new grammar in the speech recognition service to be used in future recognition requests.

Pre-Conditions

  • The user must create a new grammar file using a text editor.
  • The grammar file must be valid.

Steps for Creating a New Grammar

StepREST Service Client ActionsREST Service Actions
1POST a new grammar file to the Grammars endpoint.Saves the grammar file content on the server.

Post Conditions

  • The newly uploaded grammar file can be used in speech recognition requests by specifying its name.

Business Rules

  • If a POST request uploads a grammar file with an existing name, it will overwrite the current grammar file.
  • No license is required to use this service.

Other Notes (Assumptions, Issues, Special Requirements)

  • Grammar file names should be in ANSI format (as they are used in HTTP header fields).
  • Grammar files should be encoded in UTF-8.

Make Speech Recognition

Actor: REST Service Client
Goal: Perform speech recognition using an audio file and a predefined grammar.

Pre-Conditions

  • Valid grammar name
    • The grammar must be selected from the available grammars.
  • Valid audio file
    • The audio file must be prerecorded by the user.

Steps for Speech Recognition

StepREST Service Client ActionsRest Service Actions
1Send a POST request to the Request endpoint with the audio file and the grammar name to be used in speech recognition.Returns the recognition result in JSON format.

Post Conditions

  • None

Business Rules

  • Common audio formats are mostly supported (e.g., .wav, .opus, .mp3).
  • If the audio file is uncompressed, it is recommended to compress it before sending to improve network usage.
  • Opus format is recommended for compression, as it has the least impact on recognition accuracy.

GET Grammars

  • URL: v1/speech/recognition/grammars
  • Method: GET

Summary
Retrieves information about available grammar files.

Description
The Grammars endpoint returns a list of available grammars that can be used in speech recognition requests.

Note: No license is required to call this service.


Response Fields

The response contains essential information about each grammar file.

NameDescription
GrammarsAn array of objects containing grammar details such as name, ID, and content type.
SuccessTrue: The request was successful. False: The request failed.
ErrorMessageError message in case of a failed request.
ErrorCodeError code when the request fails (e.g., Internal Server Error).
MoreInfoAdditional information about the response.

Success Response Example

{
  "grammars": [
    {
      "id": 125,
      "name": "my42",
      "tenant": "default",
      "type": "application/srgs+xml"
    }
  ],
  "success": true,
  "errorMessage": null,
  "errorCode": null,
  "moreInfo": null
}

Error Response Example

{
  "success": false,
  "errorMessage": "Unexpected Error",
  "errorCode": "internal-service-error",
  "moreInfo": null
}

GET Specific Grammars

  • URL: v1/speech/recognition/grammars/{id}
  • Method: GET
  • Summary: Retrieves the content of a specific grammar.
  • Description: Allows users to download a grammar file from the SR server.

Note: No license is required to call this service.


DELETE Grammars

  • URL: v1/speech/recognition/grammars/{id}
  • Method: DELETE
  • Summary: Deletes a grammar file.
  • Description: Allows users to remove a grammar file from the SR server.

Note: No license is required to call this service.

POST Grammars

  • URL: v1/speech/recognition/grammars
  • Method: POST
  • Summary: Defines a new grammar.
  • Description: Allows users to send a new grammar definition to the Grammars endpoint.
    Once added, this grammar can be used in recognition requests.

Details

To send a new grammar definition:

  1. Set the Grammar Name in the request header.

    • Example: GrammarName: NewGrammarName
  2. Set the Tenant of the grammar (optional).

    • If the tenant parameter is not set, the default value will be used.
    • Example: Tenant: NewGrammarTenant
  3. Add the grammar file content as binary data to the request body.

  4. Specify the correct Content-Type in the request header when adding the grammar content.


Supported Content-Types for Grammars

  • SRGS XML:

    application/srgs+xml
    
  • Sestek Custom SRGS List:

    application/x-gslist
    

Test with cURL

To test the POST Grammars request using cURL, use the following command:

curl \
--header "GrammarName: NewSrgsXmlGrammar" \
--header "Tenant: NewGrammarTenant" \
--data-binary "@NewGrammar.grxml" \
-H "Content-Type: application/srgs+xml" \
-X POST "https://{sr-server}/v1/speech/recognition/grammars"

Note:

  • Replace {sr-server} with the actual server URL.
  • Ensure that NewGrammar.grxml is the correct path to your grammar file.

Example Request (SrgsXml Grammar)

Below is an example POST request to send an SrgsXml Grammar to the speech recognition service.

POST http://acme-pc:5000/v1/speech/recognition/grammars HTTP/1.1
Content-Disposition: File; fileName="NewSrgsXmlGrammar"; fileExtension=".grxml"
GrammarName: NewSrgsXmlGrammar
Accept: application/json, application/xml, text/json, text/x-json, text/javascript, text/xml
Accept-charset: utf-8
User-Agent: sestek-speech-recognition-rest--client
Content-Type: application/srgs+xml
Host: acme-pc:5000
Content-Length: 515
Accept-Encoding: gzip, deflate

Example SrgsXml Grammar File

<?xml version="1.0" encoding="UTF-8" ?>
<grammar mode="voice" tag-format="semantics/1.0" xml:lang="en-US" version="1.0" root="main">
    <rule id="main">
        <item>
            <one-of>
                <item>number one<tag> out = "number 1"; </tag></item>
                <item>number two<tag> out = "number 2"; </tag></item>
            </one-of>
        </item>
    </rule>
</grammar>

Note: No license is required to call this service.

Example Request: (Sestek Custom Srgs List Grammar)

Below is an example POST request to send a Sestek Custom Srgs List Grammar to the speech recognition service.

POST http://acme-pc:5000/v1/speech/recognition/grammars HTTP/1.1
Content-Disposition: File; fileName="NewlistGrammar"; fileExtension=".txt"
GrammarName: NewlistGrammar
Accept: application/json, application/xml, text/json, text/x-json, text/javascript, text/xml
Accept-charset: utf-8
User-Agent: sestek-speech-recognition-rest--client
Content-Type: application/x-gslist
Host: acme-pc:5000
Content-Length: 32
Accept-Encoding: gzip, deflate

Example Sestek Custom Srgs List Grammar File

Mersin
Adana
Corum
Kastamonu

Request Fields

NameDescription
GrammarNameSpecify a custom name for the provided grammar.
TenantSpecify a tenant for the provided grammar.
Content-TypeCan take two valid values:
SrgsXml = "application/srgs+xml"
SestekCustomSrgsList = "application/x-gslist"

Success Response Example

{
  "success": true,
  "id": 126,
  "errorMessage": null,
  "errorCode": null,
  "moreInfo": null
}

Error Response Example

{
  "success": false,
  "errorMessage": "Recognition-Parameters Are Not Defined At Header",
  "errorCode": "missing-parameter",
  "moreInfo": null
}

Response Fields

The response is returned in JSON format.

NameDescription
SuccessTrue: The request succeeded.
False: The request failed.
ErrorMessageIf the request fails, this field contains the failure message.
ErrorCodeIf the request fails, this field contains the failure error code (e.g., Internal Service Error).
MoreInfoAny additional information about the response.
idA server-generated unique ID for the grammar. This ID is required when downloading or deleting the grammar from the server.

Note:

  • No license is required to call this service.
  • If a grammar file with the same name already exists, the new upload will override the existing grammar content.

POST Generate Grammar

  • URL: v1/speech/recognition/grammars/generator
  • Method: POST
  • Summary: Generates a new grammar.
  • Description: Converts a word list into an SRGS-XML grammar.

Note: No license is required to call this service.


Details

In the request header:

  • Content-Type must be text/plain.
  • You can specify the language of the grammar.
    • If the language parameter is not set, it defaults to "tr-TR".
    • Example: language: "en-US"
  • The request body must contain a word list, where each word is on a new line.

Request Example with cURL

curl -X POST "http://{sr-server}/v1/speech/recognition/grammars/generator" \
-H "Content-Length: 20" \
-H "Content-Type: text/plain" \
-H "language: tr-TR" \
-d "merhaba
nasılsın"

Request Fields

NameDescription
LanguageSpecify a language for the generated grammar.
Content-TypeMust be text/plain.

Success Response Example

<?xml version="1.0" encoding="UTF-8" ?>
<grammar mode="voice" tag-format="semantics/1.0" xml:lang="tr-TR" version="1.0" root="main">
    <rule id="main">
        <one-of>
            <item>merhaba<tag>out = "merhaba";</tag></item>
            <item>nasılsın<tag>out = "nasılsın";</tag></item>
        </one-of>
    </rule>
</grammar>

Error Response Example

{
  "errorCode": "cannot-generate-grammar",
  "errorMessage": "failed to generate grammar",
  "moreInfo": "Content-Type has not been specified",
  "success": false
}

POST Recognition Request

  • URL: v1/speech/recognition/request
  • Method: POST
  • Summary: Performs speech recognition.
  • Description:
    Allows users to send a speech recognition request with an audio file and grammar name.
    The service will return the recognized text from the speech.

How to Send a Request:

To perform a speech recognition request:

  1. Include the following mandatory HTTP header fields:

    • Frequency (e.g., 8000)
    • GrammarName (e.g., "SimpleTestGrammar")
  2. The following fields are optional:

    • Tenant (Defaults to "Default" if not set)
    • SendAudioDownloadLink (e.g., true)

Example parameters:

- Frequency: 8000
- GrammarName: SimpleTestGrammar
- SendAudioDownloadLink: true

If the Speech Recognition Service requires a license (e.g., when used in the cloud),
you must send a bearer token as Authorization in the request header.

Once the request headers are set, the audio file should be sent as binary data in the request body.
The Content-Type of the audio file must also be specified.


Supported Audio Mime Types:

  • audio/opus
  • audio/wav
  • audio/wave

Request Example with cURL:

curl --location --request POST 'https://sr.knovuapi.com/v1/speech/recognition/request' \
--header 'Content-Type: audio/wave' \
--header 'GrammarName: SampleGrammar' \
--header 'Tenant: Default' \
--header 'Authorization: Bearer [token]' \
--data-binary '@/C:/Program Files/Sestek/SR/data/hello-world.wav'

Example Request (For WAV File):

Below is an example POST request to send a WAV audio file for speech recognition.

POST https://acme-pc:5000/v1/speech/recognition/request HTTP/1.1
Content-Disposition: file; fileName="08-tr-recognition-audio"; fileExtension=".wav"

Frequency: 8000
GrammarName: NewGrammar
SendAudioDownloadLink: True
Authorization: Bearer [token]

Accept: application/json, application/xml, text/json, text/x-json, text/javascript, text/xml
Accept-charset: utf-8
User-Agent: sestek-speech-recognition-rest--client
Content-Type: audio/wav
Host: acme-pc:5000
Content-Length: 18046
Accept-Encoding: gzip, deflate

RIFF WAVEfmt ...

Example Request (For Opus File):

Below is an example POST request to send an Opus audio file for speech recognition.

POST https://acme-pc:5000/v1/speech/recognition/request HTTP/1.1
Content-Disposition: file; fileName="09-tr-recognition-audio"; fileExtension=".opus"

Frequency: 8000
GrammarName: NewGrammar
SendAudioDownloadLink: True
Authorization: Bearer [token]

Accept: application/json, application/xml, text/json, text/x-json, text/javascript, text/xml
Accept-charset: utf-8
User-Agent: sestek-speech-recognition-rest--client
Content-Type: audio/opus
Host: acme-pc:5000
Content-Length: 9918
Accept-Encoding: gzip, deflate

OggS ..

Request Fields:

NameDescription
FrequencyThe frequency at which the operation will be performed. If not specified, the default value is 8000. This is not necessarily the frequency of the audio file sent; the sampling rate will be determined from its format.
GrammarNameThe name of the grammar used for this recognition.
TenantThe tenant of the grammar used for this recognition.
SendAudioDownloadLinkIf set to True, the server will host the audio file and provide a download link in the response.
Authorization: Bearer [token]Required for cloud usage. If using this REST service via cloud, a token must be included in the request. API Client ID and API Client Secret parameters are needed to generate a token.
Content-TypeSpecifies the type of audio file being sent (e.g., audio/[type]).

Success Response Example:

{
  "confidence": 0.99,
  "recognizedText": "Mersin",
  "semanticResult": "Mersin",
  "speechStartTimeMsec": 0,
  "speechEndTimeMsec": 1125,
  "audioLink": "http://acme-pc/...",
  "success": true,
  "errorMessage": null,
  "errorCode": null,
  "moreInfo": null
}

Error Response Example:

{
  "success": false,
  "errorMessage": "GrammarName has not been specified",
  "errorCode": "missing-parameter",
  "moreInfo": null
}

Response Fields:

NameDescription
RecognizedTextThe plain text recognition result.
SemanticResultA machine-processable representation of the recognized text, which can be more detailed than a standard transcript.
SpeechStartTimeMsecThe time (in milliseconds) when speech started in the provided audio file.
SpeechEndTimeMsecThe time (in milliseconds) when speech ended in the provided audio file.
ConfidenceThe confidence score of the recognition result (range [0,1]). A value of 1 indicates absolute confidence.
AudioLinkA link to the hosted audio file if SendAudioDownloadLink = true.
SuccessTrue: The request succeeded.
False: The request failed.
ErrorMessageIf the request fails, this field contains the failure message.
ErrorCodeIf the request fails, this field contains the failure error code (e.g., Internal Service Error).
MoreInfoAny additional information about the response.

Note:
Based on service configuration, a license may or may not be required to use this service.

POST Validation

  • URL: v1/speech/recognition/validation
  • Method: POST
  • Summary: Validates whether the provided audio contains a specific text.
  • Description:
    This endpoint is useful for cases where you have a relatively short audio file and need to determine if its content matches a given short text.

The Validation endpoint returns information about whether the provided audio matches the given text.

image.png

How to Send Request:

  • The validation request parameters should be sent as multipart/form-data.

End-Point Test with cURL:

Below is an example cURL request for testing the POST Validation endpoint.

curl \
--form validation-parameters='{
  "validationText": "Alaska",
  "language": "en-US",
  "sendDownloadLink": true,
  "Authorization": "Bearer [token]"
};type=application/json' \
--form upload=@Alaska.wav;type=audio/wav \
-X POST "http://[server-url]/v1/speech/recognition/validation"

Notes:

  • Replace [server-url] with the real server URL.
  • Replace [token] with a valid authorization token.
  • The audio file (Alaska.wav) should be correctly formatted as audio/wav.

Example Request:

Below is an example POST request for speech validation.

POST http://acme-pc:11000/v1/speech/recognition/validation HTTP/1.1
Host: acme-pc:11000
User-Agent: curl/7.48.0
Accept: */*
Content-Length: 44567
Expect: 100-continue
Content-Type: multipart/form-data; boundary=------------------------decd0b63ed9a4bf6

------------------------decd0b63ed9a4bf6
Content-Disposition: form-data; name="validation-parameters"
Content-Type: application/json

{
  "validationText": "Alaska",
  "language": "en-US",
  "frequency": 8000,
  "mediaType": "Wav",
  "sendDownloadLink": true,
  "Authorization": "Bearer [token]"
}

------------------------decd0b63ed9a4bf6
Content-Disposition: form-data; name="upload"; filename="Alaska.wav"
Content-Type: audio/wav

RIFF WAVEfmt ...

Explanation:

  • Validation parameters are sent as JSON inside a multipart/form-data request.
  • Audio file (Alaska.wav) is attached as binary data.
  • Authorization token (Bearer [token]) is required for authentication.
  • The Content-Type specifies the file format (audio/wav).

Note: Replace [token] with a valid authorization token.

Request Fields:

NameDescription
Validation parametersForm parameter name = "validation-parameters" with JSON content.
TextThe text that will be checked to determine if the audio content matches.
LanguageThe language of the text, such as tr-TR or en-US.
FrequencyThe frequency of the audio file.
Send_download_linkIf set to true, a download link for the uploaded audio file will be provided. Default value is false.
AuthorizationIf a license is not required, this should be "Null". If required, provide the token.
Audio Binary DataThe audio file to be validated. This file should be in Opus or WAV format.

Example Response:

{
  "answer": "valid",
  "moreInfo": "RecognizedText : Alaska",
  "audioLink": "Not Available"
}

Response Fields:

NameDescription
Validation parametersAnswer to the validation request.
ValidThe audio-speech matches the given control text.
NotValidThe audio-speech does not match the given control text.
MoreInfoAdditional information about the response, such as error details.
AudioLinkA link to download the input audio file in WAV format.

Speech Dictation with Language Model

This service allows you to perform Speech Dictation using a Language Model.

A language model is a file used by a Speech Dictation Engine to recognize speech.
It contains a large set of words along with their probability of occurrence, making it suitable for dictation applications.

How Language Models Work

  • Language models constrain the search in the decoder by limiting the number of possible words that can be considered at any given time.
  • This results in faster execution and higher accuracy.

How to Use Speech Dictation

  1. Choose one of the available language models.
  2. Send your audio file to the service, specifying the selected language model.

Important Notes

  • You cannot define new language models with this service.
  • You can use predefined language models.

Typical Usage Scenario
image.png

List Available Language Models

Actor: REST Service Client
Goal: Retrieve the list of available language models.

Pre-Conditions

  • None

Steps

StepREST Service Client ActionsREST Service Actions
1Send a GET request to the Models endpoint.Returns a list of available language models in JSON format.

Post Conditions

  • None

Business Rules

  • None

Other Notes

  • No license is required to call this service.

Make Speech Dictation

Actor: REST Service Client
Goal: Perform speech dictation (recognition).

Pre-Conditions

  • Valid model name
    • Must be selected from the Available Models List.
  • Valid audio file
    • Supported formats include wav, opus, mp3, etc.

Steps

StepREST Service Client ActionsREST Service Actions
1Send a POST request to the Request endpoint with the audio file and selected language model name.Returns the dictation (recognition) result in JSON format.

Post Conditions

  • None

Business Rules

  • Common audio formats are supported (wav, opus, mp3, etc.).
  • If using uncompressed data, it is recommended to compress it before sending.
  • Opus format is preferred for compression, as it has the least impact on recognition accuracy.

Other Notes

  • None

GET Models

  • URL: v1/speech/dictation/models
  • Method: GET
  • Summary: Retrieves available dictation models information.

Description
The Models endpoint returns information about the available dictation models.
The response includes:

  • Model name
  • Model frequency
  • Total number of models

How to Send Request
Simply send a GET HTTP request to the service endpoint.


End-Point Test with cURL

curl -X GET "https://{sr-server}/v1/speech/dictation/models"

Note:

  • Replace {sr-server} with the real server URL.

Example Response

{
  "models": [
    {
      "name": "TURKISH_GENERAL",
      "tenant": "Default",
      "is_persistent": "true",
      "frequency": 8000,
      "version": 1
    },
    {
      "name": "ENGLISH_GENERAL",
      "tenant": "Default",
      "is_persistent": "true",
      "frequency": 8000,
      "version": 1
    },
    {
      "name": "BankingTurkish",
      "tenant": "Default",
      "is_persistent": "false",
      "frequency": 8000,
      "version": 1
    }
  ],
  "modelsCount": 3,
  "success": true,
  "errorMessage": null,
  "errorCode": null,
  "moreInfo": null
}

**Explanation:

  • The response contains a list of available dictation models.
  • Each model includes:
    • Name (e.g., "TURKISH_GENERAL", "ENGLISH_GENERAL")
    • Tenant (Default tenant)
    • Persistence (Indicates if the model is persistent)
    • Frequency (e.g., 8000 Hz)
    • Version (Version number of the model)
  • The total number of models is provided as "modelsCount": 3.
  • Success is true if the request was successful.
  • Error fields (errorMessage, errorCode, moreInfo) are null if no errors occurred.

Response Fields:

NameDescription
ModelsAn array of model information detailing the available dictation models.
NameThe name of the model.
TenantThe tenant associated with the model.
Is PersistentIf true, the model cannot be deleted by an LMS update.
FrequencyThe frequency of the model.
VersionThe version of the model.
ModelsCountThe total number of models available in the service.
SuccessTrue: The request succeeded.
False: The request failed.

Error Fields:

NameDescription
ErrorMessageIf the request fails, this field contains the failure message.
ErrorCodeIf the request fails, this field contains the failure error code (e.g., Internal Service Error).
MoreInfoAny additional details about the response.

Note:

  • No license is required to call this service.

Get Specific Model

  • URL:
    v1/speech/dictation/models?ModelName={ModelName}&ModelVersion={ModelVersion}&Tenant={ModelTenant}
    
  • Method: GET
  • Summary: Retrieves the specified dictation model as a zip file.
  • Description:
    Allows users to download a specific model from the SR server.

End-Point Test with cURL:

curl -X GET "https://{sr-server}/v1/speech/dictation/models?ModelName={ModelName}&ModelVersion={ModelVersion}&Tenant={ModelTenant}"

Notes:

  • Replace {sr-server} with the real server URL.
  • Replace {ModelName}, {ModelVersion}, and {ModelTenant} with the appropriate model details.

Add Model

  • URL: v1/speech/dictation/models
  • Method: POST
  • Summary: Adds a model to the model list.
  • Description:
    Allows users to upload a new model to the SR server.

How to Send a Request:

To add a model, you must send the following parameters:

  • Mandatory Parameters:

    • ModelName
    • ModelVersion
  • Optional Parameters (default values are used if not provided):

    • Tenant (Default)
    • IsPersistent (false)

Additionally, the model content must be included as a zipped file in the request body.


End-Point Test with cURL:

curl \
--header "ModelName: TURKISH_GENERAL" \
--header "ModelVersion: 1" \
--header "IsPersistent: true" \
--data-binary "@TurkishGeneral.zip" \
-H "Content-Type: application/zip" \
-X POST "https://{server-url}/v1/speech/dictation/models"

Notes:

  • Replace {server-url} with the real server URL.
  • Ensure the model file (TurkishGeneral.zip) is in the correct format.

Delete Model

  • URL: v1/speech/dictation/models
  • Method: DELETE
  • Summary: Deletes a model from the model list.
  • Description:
    Allows users to remove a model from the SR server.

How to Send a Request:

To delete a model, send the following parameters:

  • Mandatory Parameters:

    • ModelName
    • ModelVersion
  • Optional Parameter (default value is used if not provided):

    • Tenant (Default)

End-Point Test with cURL:

curl \
--header "ModelName: TURKISH_GENERAL" \
--header "Tenant: ModelTenant" \
--header "ModelVersion: 1" \
-X DELETE "https://{server-url}/v1/speech/dictation/models"

Set Model Default Version

  • URL: /models/defaultversion
  • Method: POST
  • Summary: Sets the default version of a model.
  • Description:
    Allows users to change the default version of a model in the SR server.

How to Send a Request:

To set the default version of a model, send the following parameters:

  • Mandatory Parameters:

    • ModelName
    • ModelVersion
  • Optional Parameter (default value is used if not provided):

    • Tenant (Default)

End-Point Test with cURL:

Below is an example cURL request to set the default version of a model.

curl \
--header "ModelName: TURKISH_GENERAL" \
--header "Tenant: ModelTenant" \
--header "ModelVersion: 1" \
-X POST "https://{server-url}/models/defaultversion"

Notes:

  • Replace {server-url} with the real server URL.
  • Replace {ModelName}, {ModelVersion}, and {ModelTenant} with the appropriate model details.

POST Dictation Request

  • URL: v1/speech/dictation/request
  • Method: POST
  • Summary: Make Speech Dictation.
  • Description:
    Allows users to send a speech dictation request to the Request endpoint.
    The service will return the dictation (recognition) result for the provided audio file.

How to Send Request:

To send a new dictation request, include the following recognition parameters:

  • Mandatory Parameter:

    • ModelName
  • Optional Parameters (default values used if not provided):

    • Tenant
    • ModelVersion
    • ProduceNBestList
    • NBestListLength
    • SendAudioDownloadLink

Example Parameters:

- ModelName: TURKISH_GENERAL_TEST
- Tenant: ModelTenant
- ModelVersion: 1
- ProduceNBestList: True
- NBestListLength: 5
- SendAudioDownloadLink: True

Authorization:

  • If the Speech Recognition Service requires a license (e.g., when used in cloud environments),
    the Bearer token must be included in the Authorization header.

Audio File Upload:

  • The audio file should be sent as binary data in the request body.
  • Content-Type must be set accordingly.
    Supported formats:
    • audio/opus
    • audio/wav
    • audio/wave

End-Point Test with cURL:

Below is an example cURL request to send a speech dictation request.

curl \
--header "ModelName: TURKISH_GENERAL" \
--header "Tenant: ModelTenant" \
--header "ModelVersion: 1" \
--header "Authorization: Bearer [token]" \
--data-binary "@AudioToDictate.wav" \
-H "Content-Type: audio/wav" \
-X POST "https://{server-url}/v1/speech/dictation/request"

Notes:

  • Replace {server-url} with the real server URL.
  • Replace [token] with a valid authorization token if required.
  • Ensure the audio file (AudioToDictate.wav) is in a supported format.

Request Header Fields:

NameDescription
ModelNameName of the language model you want to use in your speech dictation (recognition).
TenantTenant of the language model.
ModelVersionVersion of the language model you want to use in your speech dictation (recognition).
ProduceNBestListIf true, produces multiple hypotheses as the dictation result.
NbestListLengthMaximum number of recognition hypotheses.
SendAudioDownloadLinkIf true, provides a downloadable link for the audio file you send.
ModelsCountTotal number of models available in this service.
SuccessTrue: The request succeeded.
False: The request failed.
ErrorMessageIf the request fails, this field contains the failure message.
ErrorCodeIf the request fails, this field contains the failure error code (e.g., Internal Service Error).
MoreInfoAny extra information about the response.
Content-TypeThe MIME type of the audio file.

Example Response:

{
  "resultText": "sayın meslektaşım ",
  "confidence": 1,
  "speechStartTimeMsec": 0,
  "speechEndTimeMsec": 2437,
  "nbestlist": {
    "utterances": [
      {
        "nlsmlResult": "",
        "confidence": 99,
        "recognizedWords": [
          {
            "word": "sayın",
            "startTimeMsec": 410,
            "endTimeMsec": 1110,
            "confidence": 99,
            "wordType": 1,
            "speakerId": null
          },
          {
            "word": "meslektaşım",
            "startTimeMsec": 1140,
            "endTimeMsec": 1960,
            "confidence": 99,
            "wordType": 1,
            "speakerId": null
          }
        ]
      }
    ]
  },
  "audioLink": "http://acme-pc/...",
  "success": true,
  "errorMessage": null,
  "errorCode": null,
  "moreInfo": null
}

Explanation:

  • resultText: The final recognized text from the speech.
  • confidence: The overall confidence score of the recognition.
  • speechStartTimeMsec: The starting timestamp of the speech in milliseconds.
  • speechEndTimeMsec: The ending timestamp of the speech in milliseconds.
  • nbestlist: Contains multiple utterances (if ProduceNBestList is enabled).
    • Each utterance has:
      • nlsmlResult: (if applicable).
      • confidence: Confidence score for the utterance.
      • recognizedWords: Detailed information about each recognized word.
        • word: The recognized word.
        • startTimeMsec: The start time of the word in milliseconds.
        • endTimeMsec: The end time of the word in milliseconds.
        • confidence: Confidence score for the word.
        • wordType: Type of the word (1 indicates a recognized word).
        • speakerId: If speaker identification is enabled, this field will contain the speaker ID.
  • audioLink: A link to the recorded audio file.
  • success: true if the request was successful.
  • errorMessage: If an error occurs, this field contains the error message.
  • errorCode: If an error occurs, this field contains the error code.
  • moreInfo: Additional information about the response.

Response Fields:

NameDescription
ResultTextThe dictated (recognized) text.
ConfidenceThe confidence score of recognition (Range: [0,1]).
SpeechStartTimeMsecThe time (in milliseconds) where speech started in the audio file.
SpeechEndTimeMsecThe time (in milliseconds) where speech ended in the audio file.
NbestlistThe result of multiple hypotheses as the dictation result.

Additional Response Fields:

NameDescription
AudioLinkA link to download the input audio in WAV format.
SuccessTrue: The request succeeded.
False: The request failed.
ErrorMessageIf the request fails, this field contains the failure message.
ErrorCodeIf the request fails, this field contains the failure error code (e.g., Internal Service Error).
MoreInfoAdditional details about the response.

Nbestlist Fields
Contains multiple recognition hypotheses.

NameDescription
NlsmlResultThe recognition result in NLSML format.
UtterancesThe recognized utterances.
ConfidenceThe confidence score for the recognition hypothesis.
RecognizedWordsThe recognized words list.

RecognizedWords Fields
Contains detailed information about each recognized word.

NameDescription
WordThe recognized word from the speech input.
StartTimeMsecThe start time of the utterance in milliseconds.
EndTimeMsecThe end time of the utterance in milliseconds.
ConfidenceThe confidence value of the recognized result in percentage.
WordTypeThe type of the word (e.g., Normal, Filler, Suffix, Prefix).

Note:

  • Based on service configuration, a license may or may not be required for this service.

POST Dictation Request with Custom Words

  • URL: v1/speech/dictation/request
  • Method: POST
  • Summary: Make Speech Dictation with Custom Words.
  • Description:
    This endpoint allows users to send a speech dictation request with custom words added to the language model.
    The service will return the dictation (recognition) result for the provided speech (audio file).

How to Send Request:

To send a dictation request with custom words, the following recognition parameters must be included:

  • Mandatory Parameter:

    • ModelName
  • Optional Parameters (default values used if not provided):

    • Tenant
    • ModelVersion
    • ProduceNBestList
    • NBestListLength
    • SendAudioDownloadLink

Example Parameters

- ModelName: TURKISH_GENERAL_TEST
- Tenant: ModelTenant
- ModelVersion: 1
- ProduceNBestList: True
- NBestListLength: 5
- SendAudioDownloadLink: True

Authorization:

  • If the Speech Recognition Service requires a license (e.g., when used in cloud environments), the Bearer token must be included in the Authorization header.

Audio File & Custom Words Upload:

  • The audio file should be sent as binary data in the request body.
  • Custom words should also be sent as multipart form-data along with the audio file.

End-Point Test with cURL

Below is an example cURL request to send a speech dictation request with custom words.

curl \
--header "ModelName: TURKISH_GENERAL" \
--header "Tenant: ModelTenant" \
--header "ModelVersion: 1" \
--header "Authorization: Bearer [token]" \
--header "Content-Type: multipart/form-data" \
--form custom-list="tencere\nkapak" \
--form audio="@AudioToDictate.wav" \
-X POST "https://{server-url}/v1/speech/dictation/request"

Request Header Fields

NameDescription
ModelNameName of the language model you want to use in your speech dictation (recognition).
TenantTenant of the language model.
ModelVersionVersion of the language model you want to use in your speech dictation (recognition).
ProduceNBestListIf true, produces multiple hypotheses as the dictation result.
NbestListLengthMaximum number of recognition hypotheses.
SendAudioDownloadLinkIf true, provides a downloadable link for the audio file you send.

Additional Fields

NameDescription
Token (license.*)Required if using cloud services. The API Client ID and API Client Secret will be provided upon service purchase to generate a token.
Content-TypeThe MIME type of the request body.

Notes

  • Replace {server-url} with the real server URL.
  • Replace [token] with a valid authorization token if required.
  • Ensure the audio file (AudioToDictate.wav) is in a supported format.
  • The custom word list should be sent as multipart form-data.

Example Response

{
  "resultText": "sayın meslektaşım ",
  "confidence": 1,
  "speechStartTimeMsec": 0,
  "speechEndTimeMsec": 2437,
  "nbestlist": {
    "utterances": [
      {
        "nlsmlResult": "",
        "confidence": 99,
        "recognizedWords": [
          {
            "word": "sayın",
            "startTimeMsec": 410,
            "endTimeMsec": 1110,
            "confidence": 99,
            "wordType": 1,
            "speakerId": null
          },
          {
            "word": "meslektaşım",
            "startTimeMsec": 1140,
            "endTimeMsec": 1960,
            "confidence": 99,
            "wordType": 1,
            "speakerId": null
          }
        ]
      }
    ]
  },
  "audioLink": "http://acme-pc/...",
  "success": true,
  "errorMessage": null,
  "errorCode": null,
  "moreInfo": null
}

Explanation

  • resultText: The final recognized text from the speech.
  • confidence: The overall confidence score of the recognition.
  • speechStartTimeMsec: The starting timestamp of the speech in milliseconds.
  • speechEndTimeMsec: The ending timestamp of the speech in milliseconds.
  • nbestlist: Contains multiple utterances (if ProduceNBestList is enabled).
    • Each utterance has:
      • nlsmlResult: (if applicable).
      • confidence: Confidence score for the utterance.
      • recognizedWords: Detailed information about each recognized word.
        • word: The recognized word.
        • startTimeMsec: The start time of the word in milliseconds.
        • endTimeMsec: The end time of the word in milliseconds.
        • confidence: Confidence score for the word.
        • wordType: Type of the word (1 indicates a recognized word).
        • speakerId: If speaker identification is enabled, this field will contain the speaker ID.
  • audioLink: A link to the recorded audio file.
  • success: true if the request was successful.
  • errorMessage: If an error occurs, this field contains the error message.
  • errorCode: If an error occurs, this field contains the error code.
  • moreInfo: Additional information about the response.

Response Fields

NameDescription
ResultTextDictated (recognized) text.
ConfidenceConfidence of recognition. Range [0,1].
SpeechStartTimeMsecThe starting time of the recognized speech in milliseconds.
SpeechEndTimeMsecThe ending time of the recognized speech in milliseconds.
NbestlistContains multiple hypotheses for the recognition result.
AudioLinkA link where the input audio file can be downloaded in wave format.
SuccessTrue if the request succeeded, False if it failed.
ErrorMessageA failure message when the request fails.
ErrorCodeError code if the request fails (e.g., Internal Service Error).
MoreInfoAny extra details about the response.

Nbestlist Fields

NameDescription
UtterancesList of recognized utterances.
NlsmlResultThe request succeeded.
ConfidenceConfidence level of the recognition.
RecognizedWordsList of words recognized in the speech.

RecognizedWords Fields

NameDescription
WordRecognized word from speech.
StartTimeMsecThe start time of the recognized word in milliseconds.
EndTimeMsecThe end time of the recognized word in milliseconds.
ConfidenceThe confidence level of the recognized word (in percentage).
WordTypeClassification of the word (e.g., Normal, Filler, Suffix, Prefix).

Note:
Depending on the service configuration, a license may or may not be required to call this service.

TEST TOOLS

There are several tools that you can use to test the REST API.


Curl

Curl is a command-line tool for transferring data using various protocols. It is available on multiple platforms.

For more information, visit:
🔗 https://curl.haxx.se/

It can be used to interact with the Sestek Speech Recognition REST API.

Curl is available on many platforms, including Windows, Linux, and MacOS:
🔗 https://curl.haxx.se/download.html

For Windows installation, refer to:
🔗 https://support.zendesk.com/hc/en-us/articles/203691436-Installing-and-using-cURL#curl_win


Fiddler

Fiddler is a free web debugging proxy that works across multiple browsers, systems, and platforms.

🔗 http://www.telerik.com/fiddler


HTTPie

HTTPie is a cURL alternative that is particularly suited for JSON-based REST APIs.

🔗 https://github.com/jkbrzt/httpie

For installation (Windows, Mac OS X, Linux), you can use pip:
🔗 https://pip.pypa.io/en/latest/


Postman

Postman is a popular API testing tool available as both a Google Chrome Packaged App and a Google Chrome in-browser app.

🔗 https://www.getpostman.com/


Paw

Paw is a Mac application that simplifies interaction with REST services.

🔗 https://luckymarmot.com/paw


I'm Only Resting

"I'm Only Resting" is a feature-rich WinForms-based HTTP client.

🔗 http://www.swensensoftware.com/im-only-resting


APPENDICES

This section provides an overview of useful tools and references to help you interact with REST APIs effectively. For simple brief definitions of REST API concepts, check out the following resources:


Was this article helpful?

Changing your password will log you out immediately. Use the new password to log back in.
First name must have atleast 2 characters. Numbers and special characters are not allowed.
Last name must have atleast 1 characters. Numbers and special characters are not allowed.
Enter a valid email
Enter a valid password
Your profile has been successfully updated.