| Document Number | Revision Number | Revision Date |
|---|---|---|
| KN. GU.25.EN | Rev34 | 13.04.2026 |
Language Coverage
-
Support for 99+ Languages: More than 99 languages are available, enabling organizations to build speech-enabled experiences for a wide range of geographies, markets, and customer segments.
-
Multilingual and Bilingual Models: Multilingual and bilingual model options leverage knowledge from multiple languages, helping improve performance in multilingual and mixed-language scenarios.
-
Accent Coverage in a Single Model: Multiple accents of the same language can be handled within one model, reducing operational complexity and eliminating the need to manage separate accent-specific models.
Model Flexibility and Technology Foundation
-
Support for Hosting Different Model Foundations: Different speech recognition model foundations, including approaches such as Whisper and Dolphin based models, can be hosted to match language, use case, and performance expectations more effectively.
-
Flexible Architecture for Evolving Speech Technologies: A flexible technology foundation makes it possible to adapt recognition solutions as speech technologies evolve, rather than being limited to a single fixed model strategy.
Vocabulary and Model Adaptation
-
Custom Word Support: Domain-specific, business-specific, or customer-specific words that are not currently included in the language model can be added upon request.
-
Fine-Tuning with Real Customer Data: Models can be improved through fine-tuning with real customer data, allowing better adaptation to customer terminology, speaking habits, domain language, and real-life acoustic conditions.
-
Domain Adaptation for Industry Needs: Sector-specific terminology and enterprise scenarios can be addressed through tailored adaptation, helping improve recognition quality in areas such as contact centers, banking, telecom, and public services.
Language Model Development
-
Model Creation for Low-Resource Languages: Dedicated speech recognition models can be developed even for languages with little or no ready training data by working with customer-provided or specially collected datasets.
-
Custom Data-Based Model Training: For languages without an existing mature model, custom development can be carried out with sufficient language data, typically requiring large-scale datasets such as 200+ hours depending on the language and target quality.
-
High-Potential Accuracy for New Language Models: With adequate, high-quality training data, custom speech recognition models can achieve strong performance levels, including 85%+ success rates for previously unsupported or low-resource languages.
Audio and Input Support
-
Wide Audio Format Compatibility: Audio conversion is handled through ffmpeg, with support for major formats such as G729, MP3, MP4, WAV, and Opus.
-
Flexible Audio Input Handling: Different audio sources and integration flows can be accommodated, making it easier to connect speech recognition into existing telephony and application environments.
Text Processing and Output Control
-
Numeral & Entity Formatting: Recognized content can be transformed into more usable written output by converting spoken numerals and selected entities into normalized text representations, improving readability and downstream system usability.
-
Masking of Sensitive Information: Sensitive information can be masked in recognition output through user-defined regex rules, helping protect transcribed content.
-
Structured Recognition Output: Transcribed speech can be delivered in a format suitable for automation, analytics, reporting, and operational workflows.
-
Time-Aligned Transcription: Transcription output can be provided together with timing information, supporting use cases such as subtitle generation, audio-text synchronization, search within recordings, analytics, and detailed post-processing.
Integration and Deployment
-
Easy Integration with APIs and SDKs: User-friendly APIs and SDKs help simplify integration into existing applications and platforms, reducing implementation effort.
-
Flexible Integration Options: Different integration methods and communication structures can be used to align with varying architectural and operational needs.
-
Cloud and On-Prem Deployment: Deployment can be made in cloud or on-prem environments, depending on infrastructure, security, and compliance requirements.

