---
title: "Custom TTS Process"
slug: "tts-process"
updated: 2026-04-16T05:00:49Z
published: 2026-04-16T05:00:49Z
---

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.knovvu.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Custom TTS Process

SESTEK's Custom Voice Creation feature enables businesses to develop unique, branded voices that set them apart. Using advanced machine learning and deep neural networks, the system analyzes and mimics the vocal characteristics of a designated voice talent - producing high-quality, natural-sounding speech for use in e-learning, virtual assistants, audiobooks, and more.

The process is divided into two phases: **choosing the TTS voice** and **creating the custom TTS model**.

---

## Phase 1: Choosing the TTS Voice

### Step 1 - Initial candidate selection

At least 6 candidates must be chosen for the first elimination round.

### Step 2 - Studio setup

A recording studio must be arranged for candidate sessions. If SESTEK provides the studio, this step can be skipped. Otherwise:

- A Windows computer must be available in the studio
- SESTEK will provide remote access and install the recording software

### Step 3 - Initial recordings

Each candidate records a set of sample sentences to assess their suitability:

- At least 10 sentences must be recorded per candidate
- Recordings must be made in a studio environment
- Candidates will be briefed on how to deliver sentences for TTS - consistency in intonation is critical

### Step 4 - Elimination and final selection

The candidate pool is narrowed to 2 finalists. If both parties agree on a speaker at this stage, Step 4 can be skipped.

- 15,000 words of recordings are taken from each finalist in the studio
- SESTEK creates prototype TTS voices from the recordings
- Both parties evaluate the prototypes and agree on the final voice

---

## Phase 2: Creating the Custom TTS

### Step 5 - Full recording sessions

Once the voice talent is selected, full-scale recording begins:

- A minimum of 135,000 and a maximum of 160,000 words are recorded
- A speaker can typically read approximately 2,000 words per hour; studio scheduling is based on this estimate
- Sessions are usually limited to 3–4 hours per day to maintain performance quality

**Custom text requirements:**

- All custom texts must be provided by the customer
- The word count must not exceed 10,000 sentences for voice creation
- All content must be submitted at once - additions or changes made after submission will incur additional studio and development fees
- Sentence combinations matter more than individual words; customers should provide complete expressions, not just vocabulary lists

### Step 6 - Model development

After recordings are complete, SESTEK processes the data and builds the initial TTS model.

### Step 7 - Testing and fine-tuning

- Internal tests are conducted and fine-tuning begins
- Additional words and sentences requested by the customer are recorded (limited to 10,000 words)
- Any incorrectly recorded texts are also re-recorded at this stage

### Step 8 - Feedback integration

The model is updated based on customer feedback. This cycle continues until the voice meets the agreed quality standard.

---

## Project Timeline

The table below outlines all tasks, responsible parties, and estimated durations. Tasks marked with **X days** depend on the customer's timeline or require mutual agreement.

| Task | Responsible | Duration |
| --- | --- | --- |
| **Project Start** |  |  |
| Kick-off meeting | Customer, SESTEK | 1 day |
| Choosing first candidates for evaluation | Customer | X days |
| Studio selection and setup | Customer, SESTEK | X days |
| **Speaker Evaluation** |  |  |
| Briefing speakers on TTS recording requirements | Customer, SESTEK | 1 day |
| First studio recording session | Customer, SESTEK | 10 days |
| First elimination - narrowing to 2 finalists | Customer, SESTEK | 5 days |
| Second studio recording session for finalists | Customer, SESTEK | 20 days |
| Creating TTS voice prototypes from samples | SESTEK | 15 days |
| Final voice selection with prototypes | Customer, SESTEK | X days |
| **Custom TTS Creation** |  |  |
| Signing consent letter and contract with speaker | Customer, SESTEK | 1 day |
| Final studio recording session | Customer, SESTEK | 50 days |
| TTS development and model training | SESTEK | 25 days |
| Internal testing and process finalization | SESTEK | 10 days |
| Customer testing and feedback collection | Customer | X days |
| Providing extra announcements and sentences list | Customer | X days |
| Fine-tuning based on feedback | SESTEK | X days |
