Transcribe a File
  • 29 May 2024
  • 1 Minute to read
  • Contributors
  • PDF

Transcribe a File

  • PDF

Article summary

This documentation guides you on transcribing audio files using Knovvu Speech Recognition. Follow the steps below to get started:

Endpoint Details

  • URL: {{Address}}/v1/speech/dictation/request
  • Actor: REST Service Client
  • Goal: To make speech dictation (recognition).
  • Pre-Conditions:
    • Valid model (name): Should be chosen from Available Models List. Check Available Models.
    • Valid audio file: Common audio formats are mostly supported. (wav, opus, mp3, …).

Request

First you should send recognition parameters:

  • ModelName
  • Tenant
  • ModelVersion
  • ProduceNBestList
  • NbestListLength
  • SendAudioDownloadLink
INFO
  • ModelName is mandatory.
  • Tenant, ModelVersion, ProduceNBestList, NbestListLength, SendAudioDownloadLink are optional.

Example:

    "ModelName": "English",
    "Tenant": "Default",

After adding recognition parameters and bearer token to your request header, you should add audio file to your request body as binary data.

Finally, you should add audio file as binary data to your request body. Be sure to specify the appropriate Content-Type for the audio file format, such as:

  • Content-Type: audio/opus
  • Content-Type: audio/wav
  • Content-Type: audio/wave

By following these steps, you can ensure that your Speech Recognition Service request includes all the necessary components to process your audio file correctly.

Endpoint Test with Curl

--header "ModelName: English"
--header "Tenant: ModelTenant"
--header "Authorization: Bearer [token]"  
--data-binary "@AudioToDictate.wav" 
-H "Content-Type:audio/wav" 
-X POST "https://{{Address}}/v1/speech/dictation/request"

Request Header Fields:

  • ModelName: Name of the language model you want to use in your speech dictation (recognition).
  • Tenant: Tenant of the language model.
  • ModelVersion: Version of the language model you want to use in your speech dictation (recognition). Check to see how to use versions.
  • ProduceNBestList: Maximum number of recognition hypothesis.
  • SendAudioDownloadLink: Make this field true if you want to get a link to download the audio file you send.
  • Content-Type: Type of the audio file.

Example Response:

{
    "audioLink": "{{Address}}/audio-logs/.wav",
    "confidence": 0.9599999785423279,
    "detectedAudioContent": "recognizable-speech",
    "errorCode": null,
    "errorMessage": null,
    "moreInfo": null,
    "resultText": "hello how can I help you",
    "success": true
}

Response Fields:

  • ResultText: Dictated (recognized) text.
  • Confidence: Confidence of recognition. Range [0,1]
  • SpeechStartTimeMsec: The time (in milliseconds) where speech started in audio file you send.
  • SpeechEndTimeMsec: The time (in milliseconds) where speech ended in audio file you send.
  • Nbestlist: Result of hypothesis as the dictation result.
  • AudioLink: Audio download link where you can download your input audio in wave format.
  • Success:
    • True: The request succeeded.
    • False: The request failed
  • ErrorMessage: When the request failed, failure message.
  • ErrorCode: When the request failed, failure error code. (For example, Internal Service Error)
  • MoreInfo: Any extra info about response.

Was this article helpful?

Changing your password will log you out immediately. Use the new password to log back in.
First name must have atleast 2 characters. Numbers and special characters are not allowed.
Last name must have atleast 1 characters. Numbers and special characters are not allowed.
Enter a valid email
Enter a valid password
Your profile has been successfully updated.