Custom TTS Process
  • 30 Sep 2023
  • 3 Minutes to read
  • Contributors
  • PDF

Custom TTS Process

  • PDF

Article summary

Our Text-to-Speech technology includes a Custom Voice Creation feature that enables businesses to create unique, branded voices that set them apart from competitors. This feature utilizes advanced machine learning algorithms and deep neural networks to analyze and mimic the vocal characteristics of a designated voice talent. The resulting voice can then be used to generate high-quality, natural-sounding speech for use in various applications, including e-learning, virtual assistants, and audiobooks. The process of creating a custom voice involves recording and processing large amounts of speech data, followed by fine-tuning the resulting model to produce the desired vocal characteristics.

CHOOSING THE TTS VOICE

  1. At least 6 candidates must be chosen for the first elimination in deciding the TTS voice.

  2. A studio must be provided for taking recordings from the candidates. In case the studio is provided by Sestek, the next step should be ignored.

    • We need a Windows computer in the studio. Remote access to this computer will be given. We will install our recording program to the computer on which they will give recordings.

  3. Speakers must provide recordings with their voices.

    • At least 10 sentences should be recorded.
    • They must be recorded in a studio (whether in our studio or not).
    • The speakers will be given a brief information about how they should utter sentences for TTS. Consistency in intonation is important.

  4. We will narrow these candidates to 2 people and start the second elimination for deciding the TTS voice. If both sides agree on a speaker to be chosen as the new TTS at this step, step 4 could be ignored.

    • We will take 15.000 words of recordings from each speaker in the studio.
    • After taking recordings, we will work on their voices for the elimination process and see which candidate will be the most suitable for the TTS.
      • To do this, we will create prototype TTS voices from the candidates to decide which speaker will be chosen.
      • After the tests, we will decide which voice should be used.

    CREATING A CUSTOM TTS

  5. After deciding TTS voice, we will start creating a TTS product with the chosen speaker.

  6. We will take at least 135.000 and a maximum of 160.000 words of recordings from the selected speaker in the studio.

    • Depending on the performances of the speakers, a speaker can read about 2000 words in a 1-hour recording. Studio hours are calculated accordingly.
    • Custom texts must be shared by the customer. This word count should be a maximum of 10.000 sentences for voice creation and must be transmitted at once. In case this work is exceeded and/or words are added later by the customer, there will be additional work and studio fees. Particular attention should be paid to the fact that since the combinations of words are important, expressions in sentences, not words, should be conveyed by the customer.
    • The performance will decrease as the time spent in the recording room increases, so the speaker can tell us the optimum process for this. We usually get 3-4 hours of recordings a day.

  7. After taking recordings, we will make our tests and create the new TTS.

  8. Additional tests will be made with the TTS and fine-tuning process will start.

  9. We will take additional words and sentences which the customer wants to be read. Here the number of words is limited to 10.000. In addition, if there are incorrect texts, they are also recorded.

  10. We will be updating the new TTS with the feedbacks.

PROJECT STEPS FOR CUSTOM TTS CREATION

Task NameRole NamesDuration (work days)
Custom TTS Project Plan
Start
Kick-off meetingCustomer; SESTEK1 day
Choosing first candidates for evaluationCustomerX days
Studio selection and providing a studio for recordingsCustomer; SESTEKX days
Speaker Evaluation
Briefing speakers about giving records for TTSCustomer; SESTEK1 day
First studio recording session for the selected speakersCustomer; SESTEK10 days
First elimination of speakers to 2 peopleCustomer; SESTEK5 days
Second studio recording session for the final candidatesCustomer; SESTEK20 days
Creating TTS voice prototypes with the samplesSESTEK15 days
Deciding the TTS voice with the prototypesCustomer; SESTEKX days
Creating a Custom TTS
Signing a consent letter and contract with the speakerCustomer; SESTEK1 day
Final studio recording session for the chosen voiceCustomer; SESTEK50 days
TTS development process for these recordsSESTEK25 days
Internal tests and finalizing the processSESTEK10 days
Customer tests and feedbacksCustomerX days
Providing the list of extra announcements and sentencesCustomerX days
Fine-tuning processSESTEKX days

Was this article helpful?

What's Next
Changing your password will log you out immediately. Use the new password to log back in.
First name must have atleast 2 characters. Numbers and special characters are not allowed.
Last name must have atleast 1 characters. Numbers and special characters are not allowed.
Enter a valid email
Enter a valid password
Your profile has been successfully updated.