| Document Number | Revision Number | Revision Date |
|---|---|---|
| KN. GU.26.EN | Rev64 | 14.04.2026 |
SESTEK SR is built to work where your customers are. With support for 99+ languages and continuously evolving language models, it brings consistent, high-quality speech recognition to voice applications across the globe. Each language is backed by dedicated models - trained, tested, and refined to meet the demands of real-world deployments.
Model accuracy figures reflect performance on clean, domain-matched audio. Real-world accuracy may vary depending on audio quality, background noise, speaker accent, and vocabulary coverage. For best results, consider fine-tuning with your own data.

Single Models
Single models are trained for one language and deliver the highest accuracy when the input language is known in advance. The following languages are available as single models through the SESTEK SR API.
Supported Languages: Afrikaans, Albanian, Amharic, Arabic, Armenian, Assamese, Azerbaijani, Bashkir, Basque, Belarusian, Bengali, Bosnian, Breton, Bulgarian, Burmese, Catalan, Mandarin Chinese, Cantonese, Croatian, Czech, Danish, Dutch, English, Estonian, Faroese, Persian, Filipino, Finnish, Flemish, French, Galician, Georgian, German, Greek, Gujarati, Haitian Creole, Hawaiian, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Javanese, Kannada, Kashmiri, Kazakh, Khmer, Kyrgyz, Korean, Kurdish (Kurmanji), Lao, Latin, Latvian, Lingala, Lithuanian, Luxembourgish, Macedonian, Malagasy, Malay, Malayalam, Maltese, Maori, Marathi, Mongolian, Nepali, Norwegian, Occitan, Odia, Punjabi, Pashto, Polish, Portuguese, Romanian, Russian, Sanskrit, Serbian, Shona, Sindhi, Sinhala, Slovak, Slovenian, Somali, Spanish, Sundanese, Swahili, Swedish, Tagalog, Tamil, Tatar, Telugu, Thai, Tibetan, Turkish, Turkmen, Ukrainian, Uighur, Urdu, Uzbek, Vietnamese, Welsh, Yiddish, Yoruba.
Multilingual Models
Multilingual models support multiple languages within a single model architecture. They are designed for scenarios where the input language is unknown or may vary across speakers.
| Model Name | Languages Supported |
|---|---|
| EnglishTurkish | English, Turkish |
| AzerbaijaniRussian | Azerbaijani, Russian |
| DutchFrench | Dutch, French |
| ArabicEnglish | Arabic, English |
| ArabicEnglishTurkish | Arabic, English, Turkish |
| EnglishFrenchTurkish | English, French, Turkish |
| EnglishSpanish | English, Spanish |
| EnglishMalay | English, Malay |
| LatvianRussian | Latvian, Russian |
| Mena | Arabic, English, French, Urdu |
| NorthAmerica | English, French, Portuguese, Spanish |
| Europe | English, French, Portuguese, Spanish, Dutch, German, Italian |
| Asia | Mandarin, Tamil, Malay, English |
| Asia-40 | See Dolphin Models |
| FourLanguagesMulti | English, Arabic, French, Spanish |
| SixLanguagesMulti | English, Turkish, Arabic, Russian, French, Spanish |
| Large | Arabic, Danish, Dutch, English, Finnish, French, German, Hindi, Italian, Latvian, Mandarin, Norwegian, Portuguese, Russian, Spanish, Swedish, Tagalog, Turkish, Urdu |
| WhisperTurbo | See Whisper Models |
| WhisperTiny | See Whisper Models |
When to Use Single vs. Multilingual Models
Single Models
Single models should be used when the language of the input is known before processing. They provide the highest accuracy because they are specifically trained for one language. Single models are ideal for:
- IVR-driven systems where the caller selects a language before speaking.
- Speech analytics for monolingual environments, such as customer service centers operating in a single language.
- Use cases where high accuracy is required and the language does not need to be detected dynamically.
Multilingual Models
Multilingual models should be used when the input language is unknown at recognition time, or when dynamic language switching is required. They work best for:
- IVR systems that do not provide language information, making it impossible to route to a specific monolingual model.
- Applications that support multiple languages but do not allow dynamic switching between single models.
- Use cases where users may speak different languages interchangeably, but each utterance is typically in one language.
In such cases, multilingual models detect and transcribe the input in one of their supported languages. They work best with monolingual utterances - where the entire speech segment is in a single language.
A Note on Code-Switching
Code-switching refers to scenarios where speakers mix multiple languages within the same utterance - for example, a Turkish sentence containing French words. Multilingual models are not explicitly trained to handle this, as their training data primarily consists of monolingual examples.
As a result, a multilingual model such as EnglishFrenchTurkish is not necessarily better at recognizing a French word embedded in a Turkish sentence than a standard Turkish model. Such mixed-language occurrences are rare or absent in the training data.
For improved recognition of foreign words within a sentence, we recommend using context-biasing (pronunciation support) available in SESTEK's end-to-end (E2E) models. This is a more effective approach for code-switching scenarios than relying on a multilingual model alone.
Multilingual models provide flexibility for diverse language scenarios but may not match the accuracy of single models when the input language is already known. For frequent code-switching within a single utterance, consider context-biasing techniques.
Model Families Overview
SESTEK SR exposes all languages through a unified API, while individual languages may be powered by different underlying model families. Understanding which model family serves a language helps you make the right configuration choices for your use case. Regardless of the underlying model, all are delivered within the SESTEK API framework, with no external configuration or environment setup required.
| Model Family | Developed By | Hosted By | Fine-tuning 🛈 | Best For |
|---|---|---|---|---|
| SESTEK | SESTEK | SESTEK | ✔ | High-accuracy, production-grade deployments |
| Whisper | OpenAI (external) | SESTEK | ✔ | Broad multilingual coverage, 99+ languages |
| Dolphin | External | SESTEK | - | Asian, Middle Eastern, and surrounding regions |
The table below shows which model family supports each language. A language may be available across multiple model families.
Language-to-Model Family Mapping
| Language | SESTEK | Whisper | Dolphin |
|---|---|---|---|
| Afrikaans | - | ✔ | - |
| Albanian | - | ✔ | - |
| Amharic | - | ✔ | - |
| Arabic | ✔ | ✔ | ✔ |
| Armenian | - | ✔ | - |
| Assamese | - | ✔ | - |
| Azerbaijani | ✔ | ✔ | ✔ |
| Bashkir | - | ✔ | ✔ |
| Basque | - | ✔ | - |
| Belarusian | - | ✔ | - |
| Bengali | - | ✔ | ✔ |
| Bosnian | - | ✔ | - |
| Breton | - | ✔ | - |
| Bulgarian | ✔ | ✔ | - |
| Burmese | - | ✔ | ✔ |
| Catalan | - | ✔ | - |
| Chinese | - | ✔ | - |
| Chinese (Mandarin) | ✔ | ✔ | ✔ |
| Croatian | ✔ | ✔ | - |
| Czech | ✔ | ✔ | - |
| Danish | ✔ | ✔ | - |
| Dutch | ✔ | ✔ | - |
| English | ✔ | ✔ | - |
| Estonian | - | ✔ | - |
| Faroese | - | ✔ | - |
| Farsi | ✔ | ✔ | ✔ |
| Filipino | - | - | ✔ |
| Finnish | ✔ | ✔ | - |
| Flemish | ✔ | - | - |
| French | ✔ | ✔ | - |
| Galician | - | ✔ | - |
| Georgian | - | ✔ | - |
| German | ✔ | ✔ | - |
| Greek | ✔ | ✔ | - |
| Gujarati | - | ✔ | ✔ |
| Haitian Creole | - | ✔ | - |
| Hawaiian | - | ✔ | - |
| Hebrew | - | ✔ | - |
| Hindi | ✔ | ✔ | ✔ |
| Hungarian | - | ✔ | - |
| Icelandic | - | ✔ | - |
| Indonesian | ✔ | ✔ | ✔ |
| Italian | ✔ | ✔ | - |
| Japanese | ✔ | ✔ | ✔ |
| Javanese | - | ✔ | ✔ |
| Kannada | - | ✔ | ✔ |
| Kashmiri | - | - | ✔ |
| Kazakh | ✔ | ✔ | ✔ |
| Khmer | - | ✔ | ✔ |
| Kirghiz | - | - | ✔ |
| Korean | ✔ | ✔ | ✔ |
| Kurdish (Kurmanji) | ✔ | - | - |
| Lao | - | ✔ | ✔ |
| Latin | - | ✔ | - |
| Latvian | ✔ | ✔ | - |
| Lingala | - | ✔ | - |
| Lithuanian | - | ✔ | - |
| Luxembourgish | - | ✔ | - |
| Macedonian | - | ✔ | - |
| Malagasy | - | ✔ | - |
| Malay | ✔ | ✔ | ✔ |
| Malayalam | - | ✔ | - |
| Maltese | - | ✔ | - |
| Mandarin | ✔ | ✔ | ✔ |
| Maori | - | ✔ | - |
| Marathi | - | ✔ | ✔ |
| Mongolian | ✔ | ✔ | ✔ |
| Nepali | - | ✔ | ✔ |
| Norwegian | ✔ | ✔ | - |
| Norwegian Nynorsk | - | ✔ | - |
| Occitan | - | ✔ | - |
| Oriya / Odia | - | ✔ | ✔ |
| Panjabi | - | ✔ | ✔ |
| Pashto | ✔ | ✔ | ✔ |
| Persian | ✔ | ✔ | ✔ |
| Polish | ✔ | ✔ | - |
| Portuguese | ✔ | ✔ | - |
| Punjabi | - | ✔ | ✔ |
| Romanian | ✔ | ✔ | - |
| Russian | ✔ | ✔ | ✔ |
| Sanskrit | - | ✔ | - |
| Serbian | - | ✔ | - |
| Shona | - | ✔ | - |
| Sindhi | - | ✔ | - |
| Sinhala | - | ✔ | ✔ |
| Slovak | - | ✔ | - |
| Slovenian | - | ✔ | - |
| Somali | - | ✔ | - |
| Spanish | ✔ | ✔ | - |
| Sundanese | - | ✔ | ✔ |
| Swahili | ✔ | ✔ | - |
| Swedish | ✔ | ✔ | - |
| Tagalog | ✔ | ✔ | ✔ |
| Tamil | ✔ | ✔ | ✔ |
| Tatar | - | ✔ | - |
| Telugu | - | ✔ | ✔ |
| Thai | - | ✔ | ✔ |
| Tibetan | - | ✔ | - |
| Turkish | ✔ | ✔ | ✔ |
| Turkmen | - | ✔ | - |
| Ukrainian | ✔ | ✔ | - |
| Uighur | - | - | ✔ |
| Urdu | ✔ | ✔ | ✔ |
| Uzbek | - | ✔ | ✔ |
| Vietnamese | - | ✔ | ✔ |
| Welsh | ✔ | ✔ | - |
| Yiddish | - | ✔ | - |
| Yoruba | - | ✔ | - |
| Yue Chinese | - | ✔ | ✔ |
Externally Developed, SESTEK-Hosted Models
While SESTEK ensures stable hosting and API-level compatibility, recognition accuracy may vary based on the underlying external architecture. We recommend evaluating these models in your target language and acoustic environment before production deployment.
Whisper Models
Whisper is developed by OpenAI and hosted within SESTEK SR. It provides the broadest language coverage available in the platform, spanning 99+ languages.
Supported languages: Afrikaans, Amharic, Arabic, Assamese, Azerbaijani, Bashkir, Belarusian, Bulgarian, Bengali, Tibetan, Breton, Bosnian, Catalan, Czech, Welsh, Danish, German, Greek, English, Spanish, Estonian, Basque, Persian, Finnish, Faroese, French, Galician, Gujarati, Hawaiian, Hausa, Hebrew, Hindi, Croatian, Haitian Creole, Hungarian, Armenian, Indonesian, Icelandic, Italian, Japanese, Javanese, Georgian, Kazakh, Khmer, Kannada, Korean, Latin, Luxembourgish, Lingala, Lao, Lithuanian, Latvian, Malagasy, Maori, Macedonian, Malayalam, Mongolian, Marathi, Malay, Maltese, Burmese, Nepali, Dutch, Norwegian Nynorsk, Norwegian, Occitan, Punjabi, Polish, Pashto, Portuguese, Romanian, Russian, Sanskrit, Sindhi, Sinhala, Slovak, Slovenian, Shona, Somali, Albanian, Serbian, Sundanese, Swedish, Swahili, Tamil, Telugu, Tajik, Thai, Turkmen, Tagalog, Turkish, Tatar, Ukrainian, Urdu, Uzbek, Vietnamese, Yiddish, Yoruba, Yue Chinese, Chinese.
Model Options
| Model | Description | Best For |
|---|---|---|
| WhisperTiny | Lightweight, faster processing | Speed-sensitive applications or resource-constrained environments |
| WhisperTurbo | Equivalent to Whisper large-v3-turbo; higher accuracy | High-accuracy requirements, complex or noisy environments |
API Usage
Set the ModelName parameter to WhisperTiny, WhisperTurbo, or a language-specific variant such as Norwegian-W.
Dolphin Models
Dolphin is an externally developed model family hosted within SESTEK SR. It provides broad coverage across Asian, Middle Eastern, and surrounding regions.
Supported languages: Arabic, Azerbaijani, Bashkir, Bengali, Burmese, Chinese (Mandarin), Filipino, Gujarati, Hindi, Indonesian, Japanese, Javanese, Kannada, Kashmiri, Kazakh, Khmer, Kirghiz, Korean, Lao, Malay, Marathi, Mongolian, Nepali, Oriya / Odia, Panjabi, Persian, Pashto, Russian, Sinhala, Sundanese, Tagalog, Tajik, Tamil, Telugu, Thai, Turkish, Uighur, Urdu, Uzbek, Vietnamese, Yue Chinese.
Model Options
SESTEK currently provides a single Dolphin model family - Dolphin (Small) - designed to deliver broad multilingual coverage across its supported regions.
API Usage
Set the ModelName parameter to Asia-40 or a language-specific variant such as Tagalog-D.
Language Behavior in External Models
Both Whisper and Dolphin support two operating modes:
Multilingual (Automatic Language Detection)
The model automatically detects and transcribes the input language. Suitable when the input language is unknown or may vary across speakers.
Single-Language Variants
In scenarios where automatic detection may select the wrong language - such as dialectal variations, multilingual audio, or unclear acoustic conditions - single-language variants provide explicit language control.
Single-language variants follow a standardized naming convention:
| Suffix | Model Family | Example |
|---|---|---|
-W |
Whisper | Norwegian-W, Danish-W |
-D |
Dolphin | Malay-D, Tagalog-D |
Using a single-language variant is recommended when:
- The input language is known in advance.
- Consistent output in a specific language is required.
- Automatic language detection poses a risk of misclassification.
Single-language variants for Whisper (-W) and Dolphin (-D) models are generated on demand based on customer requirements - they are not pre-compiled for every supported language by default. If you need a variant that is not currently available, contact the SESTEK Support Team or your account representative.
