Custom TTS Process

Prev Next

SESTEK's Custom Voice Creation feature enables businesses to develop unique, branded voices that set them apart. Using advanced machine learning and deep neural networks, the system analyzes and mimics the vocal characteristics of a designated voice talent - producing high-quality, natural-sounding speech for use in e-learning, virtual assistants, audiobooks, and more.

The process is divided into two phases: choosing the TTS voice and creating the custom TTS model.


Phase 1: Choosing the TTS Voice

Step 1 - Initial candidate selection

At least 6 candidates must be chosen for the first elimination round.

Step 2 - Studio setup

A recording studio must be arranged for candidate sessions. If SESTEK provides the studio, this step can be skipped. Otherwise:

  • A Windows computer must be available in the studio
  • SESTEK will provide remote access and install the recording software

Step 3 - Initial recordings

Each candidate records a set of sample sentences to assess their suitability:

  • At least 10 sentences must be recorded per candidate
  • Recordings must be made in a studio environment
  • Candidates will be briefed on how to deliver sentences for TTS - consistency in intonation is critical

Step 4 - Elimination and final selection

The candidate pool is narrowed to 2 finalists. If both parties agree on a speaker at this stage, Step 4 can be skipped.

  • 15,000 words of recordings are taken from each finalist in the studio
  • SESTEK creates prototype TTS voices from the recordings
  • Both parties evaluate the prototypes and agree on the final voice

Phase 2: Creating the Custom TTS

Step 5 - Full recording sessions

Once the voice talent is selected, full-scale recording begins:

  • A minimum of 135,000 and a maximum of 160,000 words are recorded
  • A speaker can typically read approximately 2,000 words per hour; studio scheduling is based on this estimate
  • Sessions are usually limited to 3–4 hours per day to maintain performance quality

Custom text requirements:

  • All custom texts must be provided by the customer
  • The word count must not exceed 10,000 sentences for voice creation
  • All content must be submitted at once - additions or changes made after submission will incur additional studio and development fees
  • Sentence combinations matter more than individual words; customers should provide complete expressions, not just vocabulary lists

Step 6 - Model development

After recordings are complete, SESTEK processes the data and builds the initial TTS model.

Step 7 - Testing and fine-tuning

  • Internal tests are conducted and fine-tuning begins
  • Additional words and sentences requested by the customer are recorded (limited to 10,000 words)
  • Any incorrectly recorded texts are also re-recorded at this stage

Step 8 - Feedback integration

The model is updated based on customer feedback. This cycle continues until the voice meets the agreed quality standard.


Project Timeline

The table below outlines all tasks, responsible parties, and estimated durations. Tasks marked with X days depend on the customer's timeline or require mutual agreement.

Task Responsible Duration
Project Start
Kick-off meeting Customer, SESTEK 1 day
Choosing first candidates for evaluation Customer X days
Studio selection and setup Customer, SESTEK X days
Speaker Evaluation
Briefing speakers on TTS recording requirements Customer, SESTEK 1 day
First studio recording session Customer, SESTEK 10 days
First elimination - narrowing to 2 finalists Customer, SESTEK 5 days
Second studio recording session for finalists Customer, SESTEK 20 days
Creating TTS voice prototypes from samples SESTEK 15 days
Final voice selection with prototypes Customer, SESTEK X days
Custom TTS Creation
Signing consent letter and contract with speaker Customer, SESTEK 1 day
Final studio recording session Customer, SESTEK 50 days
TTS development and model training SESTEK 25 days
Internal testing and process finalization SESTEK 10 days
Customer testing and feedback collection Customer X days
Providing extra announcements and sentences list Customer X days
Fine-tuning based on feedback SESTEK X days