SESTEK's Custom Voice Creation feature enables businesses to develop unique, branded voices that set them apart. Using advanced machine learning and deep neural networks, the system analyzes and mimics the vocal characteristics of a designated voice talent - producing high-quality, natural-sounding speech for use in e-learning, virtual assistants, audiobooks, and more.
The process is divided into two phases: choosing the TTS voice and creating the custom TTS model.
Phase 1: Choosing the TTS Voice
Step 1 - Initial candidate selection
At least 6 candidates must be chosen for the first elimination round.
Step 2 - Studio setup
A recording studio must be arranged for candidate sessions. If SESTEK provides the studio, this step can be skipped. Otherwise:
- A Windows computer must be available in the studio
- SESTEK will provide remote access and install the recording software
Step 3 - Initial recordings
Each candidate records a set of sample sentences to assess their suitability:
- At least 10 sentences must be recorded per candidate
- Recordings must be made in a studio environment
- Candidates will be briefed on how to deliver sentences for TTS - consistency in intonation is critical
Step 4 - Elimination and final selection
The candidate pool is narrowed to 2 finalists. If both parties agree on a speaker at this stage, Step 4 can be skipped.
- 15,000 words of recordings are taken from each finalist in the studio
- SESTEK creates prototype TTS voices from the recordings
- Both parties evaluate the prototypes and agree on the final voice
Phase 2: Creating the Custom TTS
Step 5 - Full recording sessions
Once the voice talent is selected, full-scale recording begins:
- A minimum of 135,000 and a maximum of 160,000 words are recorded
- A speaker can typically read approximately 2,000 words per hour; studio scheduling is based on this estimate
- Sessions are usually limited to 3–4 hours per day to maintain performance quality
Custom text requirements:
- All custom texts must be provided by the customer
- The word count must not exceed 10,000 sentences for voice creation
- All content must be submitted at once - additions or changes made after submission will incur additional studio and development fees
- Sentence combinations matter more than individual words; customers should provide complete expressions, not just vocabulary lists
Step 6 - Model development
After recordings are complete, SESTEK processes the data and builds the initial TTS model.
Step 7 - Testing and fine-tuning
- Internal tests are conducted and fine-tuning begins
- Additional words and sentences requested by the customer are recorded (limited to 10,000 words)
- Any incorrectly recorded texts are also re-recorded at this stage
Step 8 - Feedback integration
The model is updated based on customer feedback. This cycle continues until the voice meets the agreed quality standard.
Project Timeline
The table below outlines all tasks, responsible parties, and estimated durations. Tasks marked with X days depend on the customer's timeline or require mutual agreement.
| Task | Responsible | Duration |
|---|---|---|
| Project Start | ||
| Kick-off meeting | Customer, SESTEK | 1 day |
| Choosing first candidates for evaluation | Customer | X days |
| Studio selection and setup | Customer, SESTEK | X days |
| Speaker Evaluation | ||
| Briefing speakers on TTS recording requirements | Customer, SESTEK | 1 day |
| First studio recording session | Customer, SESTEK | 10 days |
| First elimination - narrowing to 2 finalists | Customer, SESTEK | 5 days |
| Second studio recording session for finalists | Customer, SESTEK | 20 days |
| Creating TTS voice prototypes from samples | SESTEK | 15 days |
| Final voice selection with prototypes | Customer, SESTEK | X days |
| Custom TTS Creation | ||
| Signing consent letter and contract with speaker | Customer, SESTEK | 1 day |
| Final studio recording session | Customer, SESTEK | 50 days |
| TTS development and model training | SESTEK | 25 days |
| Internal testing and process finalization | SESTEK | 10 days |
| Customer testing and feedback collection | Customer | X days |
| Providing extra announcements and sentences list | Customer | X days |
| Fine-tuning based on feedback | SESTEK | X days |
