The process begins when a caller places a call that is routed to a Virtual Translator infrastructure through a SIP Trunk connection.
Upon receiving the call, the SIP Gateway is employed which acts as a bridge between the Contact Center Platform and the Orchestrator service allowing voice to stream.
Before the Virtual Translator is activated, the SIP Gateway enables communication between the agent and the caller directly. It streams the caller's voice directly to the agent and agents voice directly to the caller.
If the agent does not undestand the language being spoken, the agent clicks a button or dials a pre-determined number of combinations. This enables the Virtual Translator. After the button is clicked, or the pre-determined number of combinations is dialed, the translation process starts.
The SIP Gateway collects the user voice before the translation is activated, so that the initial conversation segment is not missed.
SIP Gateway receives voice streams and creates audio files and sends these audio files to Orchestration Service.
Orchestration Service receives the voice stream.
- During these processes, a Voice Activity Detection (VAD) component is at work. This technology is designed to recognize when a person is speaking, allowing the system to process only the necessary audio.
- Upon receiving the voice stream, the Voice Translator employs Speech Recognition (SR) module to transcribe the spoken words into text in real time.
- The Voice Translator can identify the spoken language (of the caller) in real time.
- The transcribed text, the detected (caller’s) language and the target (agent’s) language are then fed to the Translation module which performs multi-lingual translation. The transcribed text undergoes real-time translation where it is converted into the target (agent’s) language.
- The translated text moves to the Text-to-Speech (TTS) engine. Here, the translated text is synthesized back into audible speech in real time.
- Along with the transcribed and translated speech, the system also captures emotional tones or sentiments from the caller's speech. This information, and the transcribed and translated text, is presented on the agent's screen by Orchestrator service.
- The translated speech stream is sent back to the SIP Gateway.
SIP Gateway streams the translated speech to the agent’s call.
The agent responds.
The process repeats.