- Print
- PDF
Our Text-to-Speech technology includes a Custom Voice Creation feature that enables businesses to create unique, branded voices that set them apart from competitors. This feature utilizes advanced machine learning algorithms and deep neural networks to analyze and mimic the vocal characteristics of a designated voice talent. The resulting voice can then be used to generate high-quality, natural-sounding speech for use in various applications, including e-learning, virtual assistants, and audiobooks. The process of creating a custom voice involves recording and processing large amounts of speech data, followed by fine-tuning the resulting model to produce the desired vocal characteristics.
CHOOSING THE TTS VOICE
At least 6 candidates must be chosen for the first elimination in deciding the TTS voice.
A studio must be provided for taking recordings from the candidates. In case the studio is provided by Sestek, the next step should be ignored.
- We need a Windows computer in the studio. Remote access to this computer will be given. We will install our recording program to the computer on which they will give recordings.
- We need a Windows computer in the studio. Remote access to this computer will be given. We will install our recording program to the computer on which they will give recordings.
Speakers must provide recordings with their voices.
- At least 10 sentences should be recorded.
- They must be recorded in a studio (whether in our studio or not).
- The speakers will be given a brief information about how they should utter sentences for TTS. Consistency in intonation is important.
We will narrow these candidates to 2 people and start the second elimination for deciding the TTS voice. If both sides agree on a speaker to be chosen as the new TTS at this step, step 4 could be ignored.
- We will take 15.000 words of recordings from each speaker in the studio.
- After taking recordings, we will work on their voices for the elimination process and see which candidate will be the most suitable for the TTS.
- To do this, we will create prototype TTS voices from the candidates to decide which speaker will be chosen.
- After the tests, we will decide which voice should be used.
CREATING A CUSTOM TTS
After deciding TTS voice, we will start creating a TTS product with the chosen speaker.
We will take at least 135.000 and a maximum of 160.000 words of recordings from the selected speaker in the studio.
- Depending on the performances of the speakers, a speaker can read about 2000 words in a 1-hour recording. Studio hours are calculated accordingly.
- Custom texts must be shared by the customer. This word count should be a maximum of 10.000 sentences for voice creation and must be transmitted at once. In case this work is exceeded and/or words are added later by the customer, there will be additional work and studio fees. Particular attention should be paid to the fact that since the combinations of words are important, expressions in sentences, not words, should be conveyed by the customer.
- The performance will decrease as the time spent in the recording room increases, so the speaker can tell us the optimum process for this. We usually get 3-4 hours of recordings a day.
After taking recordings, we will make our tests and create the new TTS.
Additional tests will be made with the TTS and fine-tuning process will start.
We will take additional words and sentences which the customer wants to be read. Here the number of words is limited to 10.000. In addition, if there are incorrect texts, they are also recorded.
We will be updating the new TTS with the feedbacks.
PROJECT STEPS FOR CUSTOM TTS CREATION
Task Name | Role Names | Duration (work days) |
---|---|---|
Custom TTS Project Plan | ||
Start | ||
Kick-off meeting | Customer; SESTEK | 1 day |
Choosing first candidates for evaluation | Customer | X days |
Studio selection and providing a studio for recordings | Customer; SESTEK | X days |
Speaker Evaluation | ||
Briefing speakers about giving records for TTS | Customer; SESTEK | 1 day |
First studio recording session for the selected speakers | Customer; SESTEK | 10 days |
First elimination of speakers to 2 people | Customer; SESTEK | 5 days |
Second studio recording session for the final candidates | Customer; SESTEK | 20 days |
Creating TTS voice prototypes with the samples | SESTEK | 15 days |
Deciding the TTS voice with the prototypes | Customer; SESTEK | X days |
Creating a Custom TTS | ||
Signing a consent letter and contract with the speaker | Customer; SESTEK | 1 day |
Final studio recording session for the chosen voice | Customer; SESTEK | 50 days |
TTS development process for these records | SESTEK | 25 days |
Internal tests and finalizing the process | SESTEK | 10 days |
Customer tests and feedbacks | Customer | X days |
Providing the list of extra announcements and sentences | Customer | X days |
Fine-tuning process | SESTEK | X days |