Custom TTS Process

Updated on 30 Sep 2023
3 Minutes to read
Contributors

Article summary

Did you find this summary helpful?

Thank you for your feedback!

Our Text-to-Speech technology includes a Custom Voice Creation feature that enables businesses to create unique, branded voices that set them apart from competitors. This feature utilizes advanced machine learning algorithms and deep neural networks to analyze and mimic the vocal characteristics of a designated voice talent. The resulting voice can then be used to generate high-quality, natural-sounding speech for use in various applications, including e-learning, virtual assistants, and audiobooks. The process of creating a custom voice involves recording and processing large amounts of speech data, followed by fine-tuning the resulting model to produce the desired vocal characteristics.

CHOOSING THE TTS VOICE

At least 6 candidates must be chosen for the first elimination in deciding the TTS voice.
A studio must be provided for taking recordings from the candidates. In case the studio is provided by Sestek, the next step should be ignored.
- We need a Windows computer in the studio. Remote access to this computer will be given. We will install our recording program to the computer on which they will give recordings.
Speakers must provide recordings with their voices.
- At least 10 sentences should be recorded.
- They must be recorded in a studio (whether in our studio or not).
- The speakers will be given a brief information about how they should utter sentences for TTS. Consistency in intonation is important.
We will narrow these candidates to 2 people and start the second elimination for deciding the TTS voice. If both sides agree on a speaker to be chosen as the new TTS at this step, step 4 could be ignored.
- We will take 15.000 words of recordings from each speaker in the studio.
- After taking recordings, we will work on their voices for the elimination process and see which candidate will be the most suitable for the TTS.
  - To do this, we will create prototype TTS voices from the candidates to decide which speaker will be chosen.
  - After the tests, we will decide which voice should be used.
CREATING A CUSTOM TTS
After deciding TTS voice, we will start creating a TTS product with the chosen speaker.
We will take at least 135.000 and a maximum of 160.000 words of recordings from the selected speaker in the studio.
- Depending on the performances of the speakers, a speaker can read about 2000 words in a 1-hour recording. Studio hours are calculated accordingly.
- Custom texts must be shared by the customer. This word count should be a maximum of 10.000 sentences for voice creation and must be transmitted at once. In case this work is exceeded and/or words are added later by the customer, there will be additional work and studio fees. Particular attention should be paid to the fact that since the combinations of words are important, expressions in sentences, not words, should be conveyed by the customer.
- The performance will decrease as the time spent in the recording room increases, so the speaker can tell us the optimum process for this. We usually get 3-4 hours of recordings a day.
After taking recordings, we will make our tests and create the new TTS.
Additional tests will be made with the TTS and fine-tuning process will start.
We will take additional words and sentences which the customer wants to be read. Here the number of words is limited to 10.000. In addition, if there are incorrect texts, they are also recorded.
We will be updating the new TTS with the feedbacks.

PROJECT STEPS FOR CUSTOM TTS CREATION

Task Name	Role Names	Duration (work days)
Custom TTS Project Plan
Start
Kick-off meeting	Customer; SESTEK	1 day
Choosing first candidates for evaluation	Customer	X days
Studio selection and providing a studio for recordings	Customer; SESTEK	X days
Speaker Evaluation
Briefing speakers about giving records for TTS	Customer; SESTEK	1 day
First studio recording session for the selected speakers	Customer; SESTEK	10 days
First elimination of speakers to 2 people	Customer; SESTEK	5 days
Second studio recording session for the final candidates	Customer; SESTEK	20 days
Creating TTS voice prototypes with the samples	SESTEK	15 days
Deciding the TTS voice with the prototypes	Customer; SESTEK	X days
Creating a Custom TTS
Signing a consent letter and contract with the speaker	Customer; SESTEK	1 day
Final studio recording session for the chosen voice	Customer; SESTEK	50 days
TTS development process for these records	SESTEK	25 days
Internal tests and finalizing the process	SESTEK	10 days
Customer tests and feedbacks	Customer	X days
Providing the list of extra announcements and sentences	Customer	X days
Fine-tuning process	SESTEK	X days

Was this article helpful?

What's Next

Voice Cloning

Table of contents

Custom TTS Process

CHOOSING THE TTS VOICE

CREATING A CUSTOM TTS

PROJECT STEPS FOR CUSTOM TTS CREATION

What's Next