| Document Number | Revision Number | Revision Date |
|---|---|---|
| KN. GU.26.EN | Rev47 | 16.11.2025 |
This document provides a unified overview of all languages supported across the Knovvu Speech Recognition.
Single Models
The list below represents all languages that can be transcribed through the Knovvu Speech Recognition API, regardless of the underlying model family.
Afrikaans, Albanian, Amharic, Arabic, Armenian, Assamese, Azerbaijani, Bashkir, Basque, Belarusian, Bengali, Bosnian, Breton, Bulgarian, Burmese, Catalan, Chinese, Chinese (Mandarin), Croatian, Czech, Danish, Dutch, English, Estonian, Faroese, Farsi, Filipino, Finnish, Flemish, French, Galician, Georgian, German, Greek, Gujarati, Haitian Creole, Hawaiian, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Javanese, Kannada, Kashmiri, Kazakh, Khmer, Kirghiz, Korean, Kurdish (Kurmanji), Lao, Latin, Latvian, Lingala, Lithuanian, Luxembourgish, Macedonian, Malagasy, Malay, Malayalam, Maltese, Maori, Mandarin, Marathi, Mongolian, Nepali, Norwegian, Norwegian Nynorsk, Occitan, Oriya/Odia, Panjabi, Pashto, Persian, Polish, Portuguese, Pushto, Punjabi, Romanian, Russian, Sanskrit, Serbian, Shona, Sindhi, Sinhala, Slovak, Slovenian, Somali, Spanish, Sundanese, Swahili, Swedish, Tagalog, Tamil, Tatar, Telugu, Thai, Tibetan, Turkish, Turkmen, Ukrainian, Uighur, Urdu, Uzbek, Vietnamese, Welsh, Yiddish, Yoruba, Yue Chinese.
Multilingual Models
Deepgram’s
The models listed below represent Sestek’s native multilingual models, each trained to support multiple languages within a single model architecture.
| Model Name | Languages Supported |
|---|---|
| EnglishTurkish | English, Turkish |
| AzerbaijaniRussian | Azerbaijani, Russian |
| DutchFrench | Dutch, French |
| ArabicEnglish | Arabic, English |
| ArabicEnglishTurkish | Arabic, English, Turkish |
| EnglishFrenchTurkish | English, French, Turkish |
| EnglishSpanish | English, Spanish |
| EnglishMalay | English, Malay |
| LatvianRussian | Latvian, Russian |
| Mena | Arabic, English, French, Urdu |
| NorthAmerica | English, French, Portuguese, Spanish |
| Europe | English, French, Portuguese, Spanish, Dutch, German, Italian |
| Asia | Mandarin, Tamil, Malay, English |
| Asia-40 | See the supported languages listed below |
| FourLanguagesMulti | English, Arabic, French, Spanish |
| SixLanguagesMulti | English, Turkish, Arabic, Russian, French, Spanish |
| Large | Arabic, Danish, Dutch, English, Finnish, French, German, Hindi, Italian, Latvian, Mandarin, Norwegian, Portuguese, Russian, Spanish, Swedish, Tagalog, Turkish, Urdu |
| WhisperTurbo | See the supported languages listed below |
| WhisperTiny | See the supported languages listed below |
Source by Model Family
The table below shows which model family provides support for each language listed in Knovvu Speech Recognition. While Knovvu SR exposes all languages through a unified API, individual languages may be supported by different underlying model architectures.
| Language | Sestek | Whisper | Dolphin |
|---|---|---|---|
| Afrikaans | — | ✔ | — |
| Albanian | — | ✔ | — |
| Amharic | — | ✔ | — |
| Arabic | ✔ | ✔ | ✔ |
| Armenian | — | ✔ | — |
| Assamese | — | ✔ | — |
| Azerbaijani | ✔ | ✔ | ✔ |
| Bashkir | — | ✔ | ✔ |
| Basque | — | ✔ | — |
| Belarusian | — | ✔ | — |
| Bengali | — | ✔ | ✔ |
| Bosnian | — | ✔ | — |
| Breton | — | ✔ | — |
| Bulgarian | ✔ | ✔ | — |
| Burmese | — | ✔ | ✔ |
| Catalan | — | ✔ | — |
| Chinese | — | ✔ | — |
| Chinese (Mandarin) | ✔ (Mandarin) | ✔ | ✔ |
| Croatian | ✔ | ✔ | — |
| Czech | ✔ | ✔ | — |
| Danish | ✔ | ✔ | — |
| Dutch | ✔ | ✔ | — |
| English | ✔ | ✔ | — |
| Estonian | — | ✔ | — |
| Faroese | — | ✔ | — |
| Farsi | ✔ | ✔ | ✔ |
| Filipino | — | — | ✔ |
| Finnish | ✔ | ✔ | — |
| Flemish | ✔ | — | — |
| French | ✔ | ✔ | — |
| Galician | — | ✔ | — |
| Georgian | — | ✔ | — |
| German | ✔ | ✔ | — |
| Greek | ✔ | ✔ | — |
| Gujarati | — | ✔ | ✔ |
| Haitian Creole | — | ✔ | — |
| Hawaiian | — | ✔ | — |
| Hebrew | — | ✔ | — |
| Hindi | ✔ | ✔ | ✔ |
| Hungarian | — | ✔ | — |
| Icelandic | — | ✔ | — |
| Indonesian | ✔ | ✔ | ✔ |
| Italian | ✔ | ✔ | — |
| Japanese | ✔ | ✔ | ✔ |
| Javanese | — | ✔ | ✔ |
| Kannada | — | ✔ | ✔ |
| Kashmiri | — | — | ✔ |
| Kazakh | ✔ | ✔ | ✔ |
| Khmer | — | ✔ | ✔ |
| Kirghiz | — | — | ✔ |
| Korean | ✔ | ✔ | ✔ |
| Kurdish (Kurmanji) | ✔ | — | — |
| Lao | — | ✔ | ✔ |
| Latin | — | ✔ | — |
| Latvian | ✔ | ✔ | — |
| Lingala | — | ✔ | — |
| Lithuanian | — | ✔ | — |
| Luxembourgish | — | ✔ | — |
| Macedonian | — | ✔ | — |
| Malagasy | — | ✔ | — |
| Malay | ✔ | ✔ | ✔ |
| Malayalam | — | ✔ | — |
| Maltese | — | ✔ | — |
| Maori | — | ✔ | — |
| Mandarin | ✔ | ✔ | ✔ |
| Marathi | — | ✔ | ✔ |
| Mongolian | ✔ | ✔ | ✔ |
| Nepali | — | ✔ | ✔ |
| Norwegian | ✔ | ✔ | — |
| Norwegian Nynorsk | — | ✔ | — |
| Occitan | — | ✔ | — |
| Oriya / Odia | — | ✔ | ✔ |
| Panjabi | — | ✔ | ✔ |
| Pashto | ✔ | ✔ | ✔ |
| Persian | ✔ | ✔ | ✔ |
| Polish | ✔ | ✔ | — |
| Portuguese | ✔ | ✔ | — |
| Punjabi | — | ✔ | ✔ |
| Romanian | ✔ | ✔ | — |
| Russian | ✔ | ✔ | ✔ |
| Sanskrit | — | ✔ | — |
| Serbian | — | ✔ | — |
| Shona | — | ✔ | — |
| Sindhi | — | ✔ | — |
| Sinhala | — | ✔ | ✔ |
| Slovak | — | ✔ | — |
| Slovenian | — | ✔ | — |
| Somali | — | ✔ | — |
| Spanish | ✔ | ✔ | — |
| Sundanese | — | ✔ | ✔ |
| Swahili | ✔ | ✔ | — |
| Swedish | ✔ | ✔ | — |
| Tagalog | ✔ | ✔ | ✔ |
| Tamil | ✔ | ✔ | ✔ |
| Tatar | — | ✔ | — |
| Telugu | — | ✔ | ✔ |
| Thai | — | ✔ | ✔ |
| Tibetan | — | ✔ | — |
| Turkish | ✔ | ✔ | ✔ |
| Turkmen | — | ✔ | — |
| Ukrainian | ✔ | ✔ | — |
| Uighur | — | — | ✔ |
| Urdu | ✔ | ✔ | ✔ |
| Uzbek | — | ✔ | ✔ |
| Vietnamese | — | ✔ | ✔ |
| Welsh | ✔ | ✔ | — |
| Yiddish | — | ✔ | — |
| Yoruba | — | ✔ | — |
| Yue Chinese | — | ✔ | ✔ |
When to Use Single vs. Multilingual Models
Single Models
Single models should be used when the language of the input is known before processing. These models provide the highest accuracy because they are specifically trained for a single language. Single models are ideal for:
-
IVR-driven systems where the caller selects a language before speaking.
-
Speech analytics for monolingual environments, such as customer service centers operating in a single language.
-
Use cases where high accuracy is required, and the language does not need to be detected dynamically.
Multilingual Models
Multilingual models should be used when the language of the input is unknown at the time of recognition or when dynamic language switching is required. These models work best for:
-
When the IVR system does not provide language information, making it impossible to route the call to a specific monolingual SR model.
-
Applications that support multiple languages but do not allow dynamic switching between single models.
-
Use cases where users may speak different languages interchangeably, but each utterance is typically monolingual.
In such cases, multilingual SR models provide high-accuracy transcriptions by detecting and transcribing the input in one of the languages they were trained on. These models work best with monolingual utterances—where the entire speech is in one of the supported languages.
Limitations with Code-Switching
Code-switching refers to scenarios where speakers mix multiple languages within the same utterance, such as a Turkish sentence containing French words. Multilingual SR models are not explicitly trained to handle code-switching, as their training data primarily consists of monolingual examples for each supported language.
As a result, multilingual models do not significantly improve the recognition of foreign words embedded in another language compared to a dedicated monolingual model. For example, an EnglishFrenchTurkish model is not necessarily better at recognizing a French word in a Turkish sentence than a standard Turkish model, since such occurrences are rare or absent in the training data.
Recommended Approach for Code-Switching Scenarios
For improved recognition of foreign words within a sentence, we recommend leveraging context-biasing (pronunciation support) available in our end-to-end (E2E) models. This approach enhances recognition accuracy for specific terms, providing a more effective solution for code-switching cases compared to relying solely on a multilingual SR model.
Multilingual models provide flexibility in handling diverse language scenarios but may not be as accurate as single models in cases where the input language is already known. If the use case involves frequent code-switching within a single utterance, context-biasing techniques should be considered to improve recognition accuracy.
Externally Developed, Knovvu-Hosted Models
Knovvu Speech Recognition includes additional model families that are developed externally but fully hosted, managed, and served within the Knovvu SR infrastructure. These models are integrated into the same API framework as Knovvu’s native speech recognition models, allowing users to access them seamlessly without any external configuration or environment setup.
Key Highlights
- These models are externally developed model families that are fully hosted and integrated within the Knovvu SR engine.
- They significantly extend Knovvu’s overall language coverage and provide alternative recognition options.
- While Sestek ensures stable hosting and API-level compatibility, performance or accuracy may vary based on the underlying external architecture.
- We recommend testing these models in your target languages and acoustic environments to confirm suitability.
Performance and Limitations
Externally developed model families—such as Whisper and Dolphin—provide broad multilingual flexibility and expand Knovvu’s language capabilities. However, because these models originate from external architectures, their performance and accuracy may vary depending on language, domain, and recording conditions.
Although Sestek ensures reliable hosting, robust deployment, and seamless API access, recognition results may differ from those obtained using Knovvu’s native models. Therefore, we recommend evaluating these models in your specific use case to verify their suitability before production deployment.
Whisper Models
The following languages are supported when using Whisper models in Knovvu SR.
Afrikaans, Amharic, Arabic, Assamese, Azerbaijani, Bashkir, Belarusian, Bulgarian, Bengali, Tibetan, Breton, Bosnian, Catalan, Czech, Welsh, Danish, German, Greek, English, Spanish, Estonian, Basque, Persian, Finnish, Faroese, French, Galician, Gujarati, Hawaiian, Hausa, Hebrew, Hindi, Croatian, Haitian Creole, Hungarian, Armenian, Indonesian, Icelandic, Italian, Japanese, Javanese, Georgian, Kazakh, Khmer, Kannada, Korean, Latin, Luxembourgish, Lingala, Lao, Lithuanian, Latvian, Malagasy, Maori, Macedonian, Malayalam, Mongolian, Marathi, Malay, Maltese, Burmese, Nepali, Dutch, Norwegian Nynorsk, Norwegian, Occitan, Punjabi, Polish, Pashto, Portuguese, Romanian, Russian, Sanskrit, Sindhi, Sinhala, Slovak, Slovenian, Shona, Somali, Albanian, Serbian, Sundanese, Swedish, Swahili, Tamil, Telugu, Tajik, Thai, Turkmen, Tagalog, Turkish, Tatar, Ukrainian, Urdu, Uzbek, Vietnamese, Yiddish, Yoruba, Yue Chinese, Chinese.
Model Options
To accommodate diverse performance needs, Knovvu offers two versions of Whisper models: WhisperTiny and WhisperTurbo. Each model provides a distinct balance of speed and recognition accuracy:
- WhisperTiny: This is a smaller, lightweight model designed for faster performance. While WhisperTiny provides quick processing speeds, its recognition accuracy may be limited in comparison to larger models. This option is ideal for applications where speed is prioritized over the highest accuracy levels, or where computing resources are limited.
- WhisperTurbo: Equivalent to the advanced "large v3 turbo" Whisper model, WhisperTurbo represents the latest in Whisper technology, offering improved recognition accuracy. However, this model requires more processing power and operates at a slower speed than WhisperTiny. WhisperTurbo is suited for applications that demand high accuracy, especially in complex or noisy environments, but can accommodate slower processing times.
API Usage Reference
To use Whisper models, set the ModelName parameter to WhisperTiny, WhisperTurbo, or one of the language-specific variants (e.g., Norwegian-W).
Dolphin Models
The following languages are supported when using Dolphin models in Knovvu SR.
Arabic, Azerbaijani, Bashkir, Bengali, Burmese, Chinese (Mandarin), Filipino, Gujarati, Hindi, Indonesian, Japanese, Javanese, Kannada, Kashmiri, Kazakh, Khmer, Kirghiz, Korean, Lao, Malay, Marathi, Mongolian, Nepali, Oriya / Odia, Panjabi, Persian, Pushto, Russian, Sinhala, Sundanese, Tagalog, Tajik, Tamil, Telugu, Thai, Turkish, Uighur, Urdu, Uzbek, Vietnamese, Yue Chinese.
Model Options
Knovvu currently provides a single Dolphin model family — Dolphin (Small) — designed to deliver broad multilingual coverage across Asian, Middle Eastern, and surrounding regions.
API Usage Reference
To use Dolphin models, set the ModelName parameter to Asia-40 or one of the language-specific variants (e.g., Tagalog-D).
Language Behavior Across External Models
External model families such as Whisper and Dolphin support two different operating modes:
Multilingual (Automatic Language Detection)
Both Whisper and Dolphin can automatically detect and transcribe any of the languages included in their multilingual architecture. This mode is ideal when the input language is unknown or may vary across speakers.
Single-Language Variants
In certain scenarios—such as multilingual audio, dialectal variations, or unclear acoustic conditions—automatic language detection may select an unintended language, resulting in reduced transcription accuracy. To provide explicit language control, Knovvu supports single-language variants of external models.
These can be invoked using language-specific names with standardized suffixes:
-
-W → Whisper-derived single-language variant
- e.g., Norwegian-W, Danish-W
-
-D → Dolphin-derived single-language variant
- e.g., Malay-D, Tagalog-D
Using a language-specific Whisper model is recommended when:
- The input language is known in advance.
- The target use case requires consistent output in a specific language.
- Automatic language detection poses a risk for misclassification.
This unified mechanism ensures predictable, language-locked recognition behavior across all external model families.
Single-language variants for Whisper (-W) and Dolphin (-D) models are not pre-compiled for every supported language by default. These models are generated on demand, based on customer requirements and real-world usage needs.
If you need a variant that is not currently listed in the API, please contact the Sestek Support Team or your account representative. Our team can evaluate your request and prepare the required single-language model for deployment.
