Supported Languages and Models

Prev Next
Document Number Revision Number Revision Date
KN. GU.26.EN Rev47 16.11.2025

This document provides a unified overview of all languages supported across the Knovvu Speech Recognition.

Single Models

The list below represents all languages that can be transcribed through the Knovvu Speech Recognition API, regardless of the underlying model family.

Afrikaans, Albanian, Amharic, Arabic, Armenian, Assamese, Azerbaijani, Bashkir, Basque, Belarusian, Bengali, Bosnian, Breton, Bulgarian, Burmese, Catalan, Chinese, Chinese (Mandarin), Croatian, Czech, Danish, Dutch, English, Estonian, Faroese, Farsi, Filipino, Finnish, Flemish, French, Galician, Georgian, German, Greek, Gujarati, Haitian Creole, Hawaiian, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Javanese, Kannada, Kashmiri, Kazakh, Khmer, Kirghiz, Korean, Kurdish (Kurmanji), Lao, Latin, Latvian, Lingala, Lithuanian, Luxembourgish, Macedonian, Malagasy, Malay, Malayalam, Maltese, Maori, Mandarin, Marathi, Mongolian, Nepali, Norwegian, Norwegian Nynorsk, Occitan, Oriya/Odia, Panjabi, Pashto, Persian, Polish, Portuguese, Pushto, Punjabi, Romanian, Russian, Sanskrit, Serbian, Shona, Sindhi, Sinhala, Slovak, Slovenian, Somali, Spanish, Sundanese, Swahili, Swedish, Tagalog, Tamil, Tatar, Telugu, Thai, Tibetan, Turkish, Turkmen, Ukrainian, Uighur, Urdu, Uzbek, Vietnamese, Welsh, Yiddish, Yoruba, Yue Chinese.

Multilingual Models

The models listed below represent Sestek’s multilingual models, each trained to support multiple languages within a single model architecture.

Model Name Languages Supported
EnglishTurkish English, Turkish
AzerbaijaniRussian Azerbaijani, Russian
DutchFrench Dutch, French
ArabicEnglish Arabic, English
ArabicEnglishTurkish Arabic, English, Turkish
EnglishFrenchTurkish English, French, Turkish
EnglishSpanish English, Spanish
EnglishMalay English, Malay
LatvianRussian Latvian, Russian
Mena Arabic, English, French, Urdu
NorthAmerica English, French, Portuguese, Spanish
Europe English, French, Portuguese, Spanish, Dutch, German, Italian
Asia Mandarin, Tamil, Malay, English
Asia-40 See the supported languages listed below
FourLanguagesMulti English, Arabic, French, Spanish
SixLanguagesMulti English, Turkish, Arabic, Russian, French, Spanish
Large Arabic, Danish, Dutch, English, Finnish, French, German, Hindi, Italian, Latvian, Mandarin, Norwegian, Portuguese, Russian, Spanish, Swedish, Tagalog, Turkish, Urdu
WhisperTurbo See the supported languages listed below
WhisperTiny See the supported languages listed below

Language-to-Model Family Mapping

The table below shows which model family provides support for each language listed in Knovvu Speech Recognition. While Knovvu SR exposes all languages through a unified API, individual languages may be supported by different underlying model architectures.

Language Sestek Whisper Dolphin
Afrikaans
Albanian
Amharic
Arabic
Armenian
Assamese
Azerbaijani
Bashkir
Basque
Belarusian
Bengali
Bosnian
Breton
Bulgarian
Burmese
Catalan
Chinese
Chinese (Mandarin) ✔ (Mandarin)
Croatian
Czech
Danish
Dutch
English
Estonian
Faroese
Farsi
Filipino
Finnish
Flemish
French
Galician
Georgian
German
Greek
Gujarati
Haitian Creole
Hawaiian
Hebrew
Hindi
Hungarian
Icelandic
Indonesian
Italian
Japanese
Javanese
Kannada
Kashmiri
Kazakh
Khmer
Kirghiz
Korean
Kurdish (Kurmanji)
Lao
Latin
Latvian
Lingala
Lithuanian
Luxembourgish
Macedonian
Malagasy
Malay
Malayalam
Maltese
Maori
Mandarin
Marathi
Mongolian
Nepali
Norwegian
Norwegian Nynorsk
Occitan
Oriya / Odia
Panjabi
Pashto
Persian
Polish
Portuguese
Punjabi
Romanian
Russian
Sanskrit
Serbian
Shona
Sindhi
Sinhala
Slovak
Slovenian
Somali
Spanish
Sundanese
Swahili
Swedish
Tagalog
Tamil
Tatar
Telugu
Thai
Tibetan
Turkish
Turkmen
Ukrainian
Uighur
Urdu
Uzbek
Vietnamese
Welsh
Yiddish
Yoruba
Yue Chinese

When to Use Single vs. Multilingual Models

Single Models

Single models should be used when the language of the input is known before processing. These models provide the highest accuracy because they are specifically trained for a single language. Single models are ideal for:

  • IVR-driven systems where the caller selects a language before speaking.

  • Speech analytics for monolingual environments, such as customer service centers operating in a single language.

  • Use cases where high accuracy is required, and the language does not need to be detected dynamically.

Multilingual Models

Multilingual models should be used when the language of the input is unknown at the time of recognition or when dynamic language switching is required. These models work best for:

  • When the IVR system does not provide language information, making it impossible to route the call to a specific monolingual SR model.

  • Applications that support multiple languages but do not allow dynamic switching between single models.

  • Use cases where users may speak different languages interchangeably, but each utterance is typically monolingual.

In such cases, multilingual SR models provide high-accuracy transcriptions by detecting and transcribing the input in one of the languages they were trained on. These models work best with monolingual utterances—where the entire speech is in one of the supported languages.

Limitations with Code-Switching

Code-switching refers to scenarios where speakers mix multiple languages within the same utterance, such as a Turkish sentence containing French words. Multilingual SR models are not explicitly trained to handle code-switching, as their training data primarily consists of monolingual examples for each supported language.

As a result, multilingual models do not significantly improve the recognition of foreign words embedded in another language compared to a dedicated monolingual model. For example, an EnglishFrenchTurkish model is not necessarily better at recognizing a French word in a Turkish sentence than a standard Turkish model, since such occurrences are rare or absent in the training data.

Recommended Approach for Code-Switching Scenarios

For improved recognition of foreign words within a sentence, we recommend leveraging context-biasing (pronunciation support) available in our end-to-end (E2E) models. This approach enhances recognition accuracy for specific terms, providing a more effective solution for code-switching cases compared to relying solely on a multilingual SR model.

Multilingual models provide flexibility in handling diverse language scenarios but may not be as accurate as single models in cases where the input language is already known. If the use case involves frequent code-switching within a single utterance, context-biasing techniques should be considered to improve recognition accuracy.

Externally Developed, Knovvu-Hosted Models

Knovvu Speech Recognition includes additional model families that are developed externally but fully hosted, managed, and served within the Knovvu SR infrastructure. These models are integrated into the same API framework as Knovvu’s native speech recognition models, allowing users to access them seamlessly without any external configuration or environment setup.

Key Highlights

  • These models are externally developed model families that are fully hosted and integrated within the Knovvu SR engine.
  • They significantly extend Knovvu’s overall language coverage and provide alternative recognition options.
  • While Sestek ensures stable hosting and API-level compatibility, performance or accuracy may vary based on the underlying external architecture.
  • We recommend testing these models in your target languages and acoustic environments to confirm suitability.

Performance and Limitations

Externally developed model families—such as Whisper and Dolphin—provide broad multilingual flexibility and expand Knovvu’s language capabilities. However, because these models originate from external architectures, their performance and accuracy may vary depending on language, domain, and recording conditions.

Although Sestek ensures reliable hosting, robust deployment, and seamless API access, recognition results may differ. Therefore, we recommend evaluating these models in your specific use case to verify their suitability before production deployment.

Whisper Models

The following languages are supported when using Whisper models in Knovvu SR.

Afrikaans, Amharic, Arabic, Assamese, Azerbaijani, Bashkir, Belarusian, Bulgarian, Bengali, Tibetan, Breton, Bosnian, Catalan, Czech, Welsh, Danish, German, Greek, English, Spanish, Estonian, Basque, Persian, Finnish, Faroese, French, Galician, Gujarati, Hawaiian, Hausa, Hebrew, Hindi, Croatian, Haitian Creole, Hungarian, Armenian, Indonesian, Icelandic, Italian, Japanese, Javanese, Georgian, Kazakh, Khmer, Kannada, Korean, Latin, Luxembourgish, Lingala, Lao, Lithuanian, Latvian, Malagasy, Maori, Macedonian, Malayalam, Mongolian, Marathi, Malay, Maltese, Burmese, Nepali, Dutch, Norwegian Nynorsk, Norwegian, Occitan, Punjabi, Polish, Pashto, Portuguese, Romanian, Russian, Sanskrit, Sindhi, Sinhala, Slovak, Slovenian, Shona, Somali, Albanian, Serbian, Sundanese, Swedish, Swahili, Tamil, Telugu, Tajik, Thai, Turkmen, Tagalog, Turkish, Tatar, Ukrainian, Urdu, Uzbek, Vietnamese, Yiddish, Yoruba, Yue Chinese, Chinese.

Model Options

To accommodate diverse performance needs, Knovvu offers two versions of Whisper models: WhisperTiny and WhisperTurbo. Each model provides a distinct balance of speed and recognition accuracy:

  • WhisperTiny: This is a smaller, lightweight model designed for faster performance. While WhisperTiny provides quick processing speeds, its recognition accuracy may be limited in comparison to larger models. This option is ideal for applications where speed is prioritized over the highest accuracy levels, or where computing resources are limited.

  • WhisperTurbo: Equivalent to the advanced "large v3 turbo" Whisper model, WhisperTurbo represents the latest in Whisper technology, offering improved recognition accuracy. However, this model requires more processing power and operates at a slower speed than WhisperTiny. WhisperTurbo is suited for applications that demand high accuracy, especially in complex or noisy environments, but can accommodate slower processing times.

API Usage Reference

To use Whisper models, set the ModelName parameter to WhisperTiny, WhisperTurbo, or one of the language-specific variants (e.g., Norwegian-W).

Dolphin Models

The following languages are supported when using Dolphin models in Knovvu SR.

Arabic, Azerbaijani, Bashkir, Bengali, Burmese, Chinese (Mandarin), Filipino, Gujarati, Hindi, Indonesian, Japanese, Javanese, Kannada, Kashmiri, Kazakh, Khmer, Kirghiz, Korean, Lao, Malay, Marathi, Mongolian, Nepali, Oriya / Odia, Panjabi, Persian, Pashto, Russian, Sinhala, Sundanese, Tagalog, Tajik, Tamil, Telugu, Thai, Turkish, Uighur, Urdu, Uzbek, Vietnamese, Yue Chinese.

Model Options

Knovvu currently provides a single Dolphin model family — Dolphin (Small) — designed to deliver broad multilingual coverage across Asian, Middle Eastern, and surrounding regions.

API Usage Reference

To use Dolphin models, set the ModelName parameter to Asia-40 or one of the language-specific variants (e.g., Tagalog-D).

Language Behavior Across External Models

External model families such as Whisper and Dolphin support two different operating modes:

Multilingual (Automatic Language Detection)

Both Whisper and Dolphin can automatically detect and transcribe any of the languages included in their multilingual architecture. This mode is ideal when the input language is unknown or may vary across speakers.

Single-Language Variants

In certain scenarios—such as multilingual audio, dialectal variations, or unclear acoustic conditions—automatic language detection may select an unintended language, resulting in reduced transcription accuracy. To provide explicit language control, Knovvu supports single-language variants of external models.

These can be invoked using language-specific names with standardized suffixes:

  • -W → Whisper-derived single-language variant

    • e.g., Norwegian-W, Danish-W
  • -D → Dolphin-derived single-language variant

    • e.g., Malay-D, Tagalog-D

Using a language-specific Whisper model is recommended when:

  • The input language is known in advance.
  • The target use case requires consistent output in a specific language.
  • Automatic language detection poses a risk for misclassification.

This unified mechanism ensures predictable, language-locked recognition behavior across all external model families.

Note on Single-Language Variants

Single-language variants for Whisper (-W) and Dolphin (-D) models are not pre-compiled for every supported language by default. These models are generated on demand, based on customer requirements and real-world usage needs.

If you need a variant that is not currently listed in the API, please contact the Sestek Support Team or your account representative. Our team can evaluate your request and prepare the required single-language model for deployment.