Supported Languages and Models

Document Number	Revision Number	Revision Date
KN. GU.26.EN	Rev58	25.02.2026

This document provides a unified overview of all languages supported across the SESTEK Speech Recognition.

Single Models

The list below represents all languages that can be transcribed through the SESTEK Speech Recognition API, regardless of the underlying model family.

Afrikaans, Albanian, Amharic, Arabic, Armenian, Assamese, Azerbaijani, Bashkir, Basque, Belarusian, Bengali, Bosnian, Breton, Bulgarian, Burmese, Catalan, Mandarin Chinese, Cantonese, Croatian, Czech, Danish, Dutch, English, Estonian, Faroese, Persian, Filipino, Finnish, Flemish, French, Galician, Georgian, German, Greek, Gujarati, Haitian Creole, Hawaiian, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Javanese, Kannada, Kashmiri, Kazakh, Khmer, Kyrgyz, Korean, Kurdish (Kurmanji), Lao, Latin, Latvian, Lingala, Lithuanian, Luxembourgish, Macedonian, Malagasy, Malay, Malayalam, Maltese, Maori, Marathi, Mongolian, Nepali, Norwegian, Occitan, Odia, Punjabi, Pashto, Polish, Portuguese, Romanian, Russian, Sanskrit, Serbian, Shona, Sindhi, Sinhala, Slovak, Slovenian, Somali, Spanish, Sundanese, Swahili, Swedish, Tagalog, Tamil, Tatar, Telugu, Thai, Tibetan, Turkish, Turkmen, Ukrainian, Uighur, Urdu, Uzbek, Vietnamese, Welsh, Yiddish, Yoruba.

Multilingual Models

The models listed below represent Sestek’s multilingual models, each trained to support multiple languages within a single model architecture.

Model Name	Languages Supported
EnglishTurkish	English, Turkish
AzerbaijaniRussian	Azerbaijani, Russian
DutchFrench	Dutch, French
ArabicEnglish	Arabic, English
ArabicEnglishTurkish	Arabic, English, Turkish
EnglishFrenchTurkish	English, French, Turkish
EnglishSpanish	English, Spanish
EnglishMalay	English, Malay
LatvianRussian	Latvian, Russian
Mena	Arabic, English, French, Urdu
NorthAmerica	English, French, Portuguese, Spanish
Europe	English, French, Portuguese, Spanish, Dutch, German, Italian
Asia	Mandarin, Tamil, Malay, English
Asia-40	See the supported languages listed below
FourLanguagesMulti	English, Arabic, French, Spanish
SixLanguagesMulti	English, Turkish, Arabic, Russian, French, Spanish
Large	Arabic, Danish, Dutch, English, Finnish, French, German, Hindi, Italian, Latvian, Mandarin, Norwegian, Portuguese, Russian, Spanish, Swedish, Tagalog, Turkish, Urdu
WhisperTurbo	See the supported languages listed below
WhisperTiny	See the supported languages listed below

Language-to-Model Family Mapping

The table below shows which model family provides support for each language listed in SESTEK Speech Recognition. While SESTEK SR exposes all languages through a unified API, individual languages may be supported by different underlying model architectures.

Language	Sestek	Whisper	Dolphin
Afrikaans	—	✔	—
Albanian	—	✔	—
Amharic	—	✔	—
Arabic	✔	✔	✔
Armenian	—	✔	—
Assamese	—	✔	—
Azerbaijani	✔	✔	✔
Bashkir	—	✔	✔
Basque	—	✔	—
Belarusian	—	✔	—
Bengali	—	✔	✔
Bosnian	—	✔	—
Breton	—	✔	—
Bulgarian	✔	✔	—
Burmese	—	✔	✔
Catalan	—	✔	—
Chinese	—	✔	—
Chinese (Mandarin)	✔ (Mandarin)	✔	✔
Croatian	✔	✔	—
Czech	✔	✔	—
Danish	✔	✔	—
Dutch	✔	✔	—
English	✔	✔	—
Estonian	—	✔	—
Faroese	—	✔	—
Farsi	✔	✔	✔
Filipino	—	—	✔
Finnish	✔	✔	—
Flemish	✔	—	—
French	✔	✔	—
Galician	—	✔	—
Georgian	—	✔	—
German	✔	✔	—
Greek	✔	✔	—
Gujarati	—	✔	✔
Haitian Creole	—	✔	—
Hawaiian	—	✔	—
Hebrew	—	✔	—
Hindi	✔	✔	✔
Hungarian	—	✔	—
Icelandic	—	✔	—
Indonesian	✔	✔	✔
Italian	✔	✔	—
Japanese	✔	✔	✔
Javanese	—	✔	✔
Kannada	—	✔	✔
Kashmiri	—	—	✔
Kazakh	✔	✔	✔
Khmer	—	✔	✔
Kirghiz	—	—	✔
Korean	✔	✔	✔
Kurdish (Kurmanji)	✔	—	—
Lao	—	✔	✔
Latin	—	✔	—
Latvian	✔	✔	—
Lingala	—	✔	—
Lithuanian	—	✔	—
Luxembourgish	—	✔	—
Macedonian	—	✔	—
Malagasy	—	✔	—
Malay	✔	✔	✔
Malayalam	—	✔	—
Maltese	—	✔	—
Maori	—	✔	—
Mandarin	✔	✔	✔
Marathi	—	✔	✔
Mongolian	✔	✔	✔
Nepali	—	✔	✔
Norwegian	✔	✔	—
Norwegian Nynorsk	—	✔	—
Occitan	—	✔	—
Oriya / Odia	—	✔	✔
Panjabi	—	✔	✔
Pashto	✔	✔	✔
Persian	✔	✔	✔
Polish	✔	✔	—
Portuguese	✔	✔	—
Punjabi	—	✔	✔
Romanian	✔	✔	—
Russian	✔	✔	✔
Sanskrit	—	✔	—
Serbian	—	✔	—
Shona	—	✔	—
Sindhi	—	✔	—
Sinhala	—	✔	✔
Slovak	—	✔	—
Slovenian	—	✔	—
Somali	—	✔	—
Spanish	✔	✔	—
Sundanese	—	✔	✔
Swahili	✔	✔	—
Swedish	✔	✔	—
Tagalog	✔	✔	✔
Tamil	✔	✔	✔
Tatar	—	✔	—
Telugu	—	✔	✔
Thai	—	✔	✔
Tibetan	—	✔	—
Turkish	✔	✔	✔
Turkmen	—	✔	—
Ukrainian	✔	✔	—
Uighur	—	—	✔
Urdu	✔	✔	✔
Uzbek	—	✔	✔
Vietnamese	—	✔	✔
Welsh	✔	✔	—
Yiddish	—	✔	—
Yoruba	—	✔	—
Yue Chinese	—	✔	✔

When to Use Single vs. Multilingual Models

Single Models

Single models should be used when the language of the input is known before processing. These models provide the highest accuracy because they are specifically trained for a single language. Single models are ideal for:

IVR-driven systems where the caller selects a language before speaking.
Speech analytics for monolingual environments, such as customer service centers operating in a single language.
Use cases where high accuracy is required, and the language does not need to be detected dynamically.

Multilingual Models

Multilingual models should be used when the language of the input is unknown at the time of recognition or when dynamic language switching is required. These models work best for:

When the IVR system does not provide language information, making it impossible to route the call to a specific monolingual SR model.
Applications that support multiple languages but do not allow dynamic switching between single models.
Use cases where users may speak different languages interchangeably, but each utterance is typically monolingual.

In such cases, multilingual SR models provide high-accuracy transcriptions by detecting and transcribing the input in one of the languages they were trained on. These models work best with monolingual utterances—where the entire speech is in one of the supported languages.

Limitations with Code-Switching

Code-switching refers to scenarios where speakers mix multiple languages within the same utterance, such as a Turkish sentence containing French words. Multilingual SR models are not explicitly trained to handle code-switching, as their training data primarily consists of monolingual examples for each supported language.

As a result, multilingual models do not significantly improve the recognition of foreign words embedded in another language compared to a dedicated monolingual model. For example, an EnglishFrenchTurkish model is not necessarily better at recognizing a French word in a Turkish sentence than a standard Turkish model, since such occurrences are rare or absent in the training data.

Recommended Approach for Code-Switching Scenarios

For improved recognition of foreign words within a sentence, we recommend leveraging context-biasing (pronunciation support) available in our end-to-end (E2E) models. This approach enhances recognition accuracy for specific terms, providing a more effective solution for code-switching cases compared to relying solely on a multilingual SR model.

Multilingual models provide flexibility in handling diverse language scenarios but may not be as accurate as single models in cases where the input language is already known. If the use case involves frequent code-switching within a single utterance, context-biasing techniques should be considered to improve recognition accuracy.

Externally Developed, SESTEK-Hosted Models

SESTEK Speech Recognition includes additional model families that are developed externally but fully hosted, managed, and served within the SESTEK SR infrastructure. These models are integrated into the same API framework as SESTEK’s native speech recognition models, allowing users to access them seamlessly without any external configuration or environment setup.

Key Highlights

These models are externally developed model families that are fully hosted and integrated within the SESTEK SR engine.
They significantly extend SESTEK’s overall language coverage and provide alternative recognition options.
While Sestek ensures stable hosting and API-level compatibility, performance or accuracy may vary based on the underlying external architecture.
We recommend testing these models in your target languages and acoustic environments to confirm suitability.

Performance and Limitations

Externally developed model families—such as Whisper and Dolphin—provide broad multilingual flexibility and expand SESTEK’s language capabilities. However, because these models originate from external architectures, their performance and accuracy may vary depending on language, domain, and recording conditions.

Although Sestek ensures reliable hosting, robust deployment, and seamless API access, recognition results may differ. Therefore, we recommend evaluating these models in your specific use case to verify their suitability before production deployment.

Whisper Models

The following languages are supported when using Whisper models in SESTEK SR.

Afrikaans, Amharic, Arabic, Assamese, Azerbaijani, Bashkir, Belarusian, Bulgarian, Bengali, Tibetan, Breton, Bosnian, Catalan, Czech, Welsh, Danish, German, Greek, English, Spanish, Estonian, Basque, Persian, Finnish, Faroese, French, Galician, Gujarati, Hawaiian, Hausa, Hebrew, Hindi, Croatian, Haitian Creole, Hungarian, Armenian, Indonesian, Icelandic, Italian, Japanese, Javanese, Georgian, Kazakh, Khmer, Kannada, Korean, Latin, Luxembourgish, Lingala, Lao, Lithuanian, Latvian, Malagasy, Maori, Macedonian, Malayalam, Mongolian, Marathi, Malay, Maltese, Burmese, Nepali, Dutch, Norwegian Nynorsk, Norwegian, Occitan, Punjabi, Polish, Pashto, Portuguese, Romanian, Russian, Sanskrit, Sindhi, Sinhala, Slovak, Slovenian, Shona, Somali, Albanian, Serbian, Sundanese, Swedish, Swahili, Tamil, Telugu, Tajik, Thai, Turkmen, Tagalog, Turkish, Tatar, Ukrainian, Urdu, Uzbek, Vietnamese, Yiddish, Yoruba, Yue Chinese, Chinese.

Model Options

To accommodate diverse performance needs, SESTEK offers two versions of Whisper models: WhisperTiny and WhisperTurbo. Each model provides a distinct balance of speed and recognition accuracy:

WhisperTiny: This is a smaller, lightweight model designed for faster performance. While WhisperTiny provides quick processing speeds, its recognition accuracy may be limited in comparison to larger models. This option is ideal for applications where speed is prioritized over the highest accuracy levels, or where computing resources are limited.
WhisperTurbo: Equivalent to the advanced "large v3 turbo" Whisper model, WhisperTurbo represents the latest in Whisper technology, offering improved recognition accuracy. However, this model requires more processing power and operates at a slower speed than WhisperTiny. WhisperTurbo is suited for applications that demand high accuracy, especially in complex or noisy environments, but can accommodate slower processing times.

API Usage Reference

To use Whisper models, set the ModelName parameter to WhisperTiny, WhisperTurbo, or one of the language-specific variants (e.g., Norwegian-W).

Dolphin Models

The following languages are supported when using Dolphin models in SESTEK SR.

Arabic, Azerbaijani, Bashkir, Bengali, Burmese, Chinese (Mandarin), Filipino, Gujarati, Hindi, Indonesian, Japanese, Javanese, Kannada, Kashmiri, Kazakh, Khmer, Kirghiz, Korean, Lao, Malay, Marathi, Mongolian, Nepali, Oriya / Odia, Panjabi, Persian, Pashto, Russian, Sinhala, Sundanese, Tagalog, Tajik, Tamil, Telugu, Thai, Turkish, Uighur, Urdu, Uzbek, Vietnamese, Yue Chinese.

Model Options

SESTEK currently provides a single Dolphin model family — Dolphin (Small) — designed to deliver broad multilingual coverage across Asian, Middle Eastern, and surrounding regions.

API Usage Reference

To use Dolphin models, set the ModelName parameter to Asia-40 or one of the language-specific variants (e.g., Tagalog-D).

Language Behavior Across External Models

External model families such as Whisper and Dolphin support two different operating modes:

Multilingual (Automatic Language Detection)

Both Whisper and Dolphin can automatically detect and transcribe any of the languages included in their multilingual architecture. This mode is ideal when the input language is unknown or may vary across speakers.

Single-Language Variants

In certain scenarios—such as multilingual audio, dialectal variations, or unclear acoustic conditions—automatic language detection may select an unintended language, resulting in reduced transcription accuracy. To provide explicit language control, SESTEK supports single-language variants of external models.

These can be invoked using language-specific names with standardized suffixes:

-W → Whisper-derived single-language variant
- e.g., Norwegian-W, Danish-W
-D → Dolphin-derived single-language variant
- e.g., Malay-D, Tagalog-D

Using a language-specific Whisper model is recommended when:

The input language is known in advance.
The target use case requires consistent output in a specific language.
Automatic language detection poses a risk for misclassification.

This unified mechanism ensures predictable, language-locked recognition behavior across all external model families.

Note on Single-Language Variants

Single-language variants for Whisper (-W) and Dolphin (-D) models are not pre-compiled for every supported language by default. These models are generated on demand, based on customer requirements and real-world usage needs.

If you need a variant that is not currently listed in the API, please contact the Sestek Support Team or your account representative. Our team can evaluate your request and prepare the required single-language model for deployment.