Transcribe a File

Updated on 29 May 2024
1 Minute to read
Contributors

Article summary

Did you find this summary helpful?

Thank you for your feedback

This documentation guides you on transcribing audio files using Knovvu Speech Recognition. Follow the steps below to get started:

Endpoint Details

URL: {{Address}}/v1/speech/dictation/request
Actor: REST Service Client
Goal: To make speech dictation (recognition).
Pre-Conditions:
- Valid model (name): Should be chosen from Available Models List. Check Available Models.
- Valid audio file: Common audio formats are mostly supported. (wav, opus, mp3, …).

Request

First you should send recognition parameters:

ModelName
Tenant
ModelVersion
ProduceNBestList
NbestListLength
SendAudioDownloadLink

INFO

ModelName is mandatory.
Tenant, ModelVersion, ProduceNBestList, NbestListLength, SendAudioDownloadLink are optional.

Example:

    "ModelName": "English",
    "Tenant": "Default",

After adding recognition parameters and bearer token to your request header, you should add audio file to your request body as binary data.

Finally, you should add audio file as binary data to your request body. Be sure to specify the appropriate Content-Type for the audio file format, such as:

Content-Type: audio/opus
Content-Type: audio/wav
Content-Type: audio/wave

By following these steps, you can ensure that your Speech Recognition Service request includes all the necessary components to process your audio file correctly.

Endpoint Test with Curl

--header "ModelName: English"
--header "Tenant: ModelTenant"
--header "Authorization: Bearer [token]"  
--data-binary "@AudioToDictate.wav" 
-H "Content-Type:audio/wav" 
-X POST "https://{{Address}}/v1/speech/dictation/request"

Request Header Fields:

ModelName: Name of the language model you want to use in your speech dictation (recognition).
Tenant: Tenant of the language model.
ModelVersion: Version of the language model you want to use in your speech dictation (recognition). Check to see how to use versions.
ProduceNBestList: Maximum number of recognition hypothesis.
SendAudioDownloadLink: Make this field true if you want to get a link to download the audio file you send.
Content-Type: Type of the audio file.

Example Response:

{
    "audioLink": "{{Address}}/audio-logs/.wav",
    "confidence": 0.9599999785423279,
    "detectedAudioContent": "recognizable-speech",
    "errorCode": null,
    "errorMessage": null,
    "moreInfo": null,
    "resultText": "hello how can I help you",
    "success": true
}

Response Fields:

ResultText: Dictated (recognized) text.
Confidence: Confidence of recognition. Range [0,1]
SpeechStartTimeMsec: The time (in milliseconds) where speech started in audio file you send.
SpeechEndTimeMsec: The time (in milliseconds) where speech ended in audio file you send.
Nbestlist: Result of hypothesis as the dictation result.
AudioLink: Audio download link where you can download your input audio in wave format.
Success:
- True: The request succeeded.
- False: The request failed
ErrorMessage: When the request failed, failure message.
ErrorCode: When the request failed, failure error code. (For example, Internal Service Error)
MoreInfo: Any extra info about response.

Was this article helpful?

What's Next

Transcribe in Real-time

Table of contents

Endpoint Details
Request
Endpoint Test with Curl