Custom TTS Process

SESTEK's Custom Voice Creation feature enables businesses to develop unique, branded voices that set them apart. Using advanced machine learning and deep neural networks, the system analyzes and mimics the vocal characteristics of a designated voice talent - producing high-quality, natural-sounding speech for use in e-learning, virtual assistants, audiobooks, and more.

The process is divided into two phases: choosing the TTS voice and creating the custom TTS model.

Phase 1: Choosing the TTS Voice

Step 1 - Initial candidate selection

At least 6 candidates must be chosen for the first elimination round.

Step 2 - Studio setup

A recording studio must be arranged for candidate sessions. If SESTEK provides the studio, this step can be skipped. Otherwise:

A Windows computer must be available in the studio
SESTEK will provide remote access and install the recording software

Step 3 - Initial recordings

Each candidate records a set of sample sentences to assess their suitability:

At least 10 sentences must be recorded per candidate
Recordings must be made in a studio environment
Candidates will be briefed on how to deliver sentences for TTS - consistency in intonation is critical

Step 4 - Elimination and final selection

The candidate pool is narrowed to 2 finalists. If both parties agree on a speaker at this stage, Step 4 can be skipped.

15,000 words of recordings are taken from each finalist in the studio
SESTEK creates prototype TTS voices from the recordings
Both parties evaluate the prototypes and agree on the final voice

Phase 2: Creating the Custom TTS

Step 5 - Full recording sessions

Once the voice talent is selected, full-scale recording begins:

A minimum of 135,000 and a maximum of 160,000 words are recorded
A speaker can typically read approximately 2,000 words per hour; studio scheduling is based on this estimate
Sessions are usually limited to 3–4 hours per day to maintain performance quality

Custom text requirements:

All custom texts must be provided by the customer
The word count must not exceed 10,000 sentences for voice creation
All content must be submitted at once - additions or changes made after submission will incur additional studio and development fees
Sentence combinations matter more than individual words; customers should provide complete expressions, not just vocabulary lists

Step 6 - Model development

After recordings are complete, SESTEK processes the data and builds the initial TTS model.

Step 7 - Testing and fine-tuning

Internal tests are conducted and fine-tuning begins
Additional words and sentences requested by the customer are recorded (limited to 10,000 words)
Any incorrectly recorded texts are also re-recorded at this stage

Step 8 - Feedback integration

The model is updated based on customer feedback. This cycle continues until the voice meets the agreed quality standard.

Project Timeline

The table below outlines all tasks, responsible parties, and estimated durations. Tasks marked with X days depend on the customer's timeline or require mutual agreement.

Task	Responsible	Duration
Project Start
Kick-off meeting	Customer, SESTEK	1 day
Choosing first candidates for evaluation	Customer	X days
Studio selection and setup	Customer, SESTEK	X days
Speaker Evaluation
Briefing speakers on TTS recording requirements	Customer, SESTEK	1 day
First studio recording session	Customer, SESTEK	10 days
First elimination - narrowing to 2 finalists	Customer, SESTEK	5 days
Second studio recording session for finalists	Customer, SESTEK	20 days
Creating TTS voice prototypes from samples	SESTEK	15 days
Final voice selection with prototypes	Customer, SESTEK	X days
Custom TTS Creation
Signing consent letter and contract with speaker	Customer, SESTEK	1 day
Final studio recording session	Customer, SESTEK	50 days
TTS development and model training	SESTEK	25 days
Internal testing and process finalization	SESTEK	10 days
Customer testing and feedback collection	Customer	X days
Providing extra announcements and sentences list	Customer	X days
Fine-tuning based on feedback	SESTEK	X days

Documentation Index