Voice Cloning

SESTEK makes voice cloning simple and accessible. Whether you are working on a product, exploring what's possible, or need a branded voice for your application - the process is straightforward. Follow the steps below, and we handle the rest.

Voice cloning is not yet available via API. To request a cloned voice, contact the SESTEK sales team directly.

How It Works

Recording Requirements

To produce a high-quality clone, your recording should meet the following criteria:

Record in a studio - use a quiet, professional recording environment. Avoid recording on phones to prevent unwanted noise or static.
Sample rate - 48 kHz for best results.
Speak naturally - be dynamic, avoid monotony. The clone will reflect the energy and character of your recording.
Script - you can read any plain text of your choice. No specific script is required, just ensure the content is clear and varied.

Recording Duration

Duration	What to expect
1–2 minutes	A quick trial clone - gives a general idea of the cloned voice quality
20–30 minutes	Recommended for best cloning performance and production-ready output

Finding 20–30 minutes of clean studio recording can be challenging. A shorter recording of 1–2 minutes is enough to evaluate the results before committing to a full session.

Getting Started

Once your recording meets the criteria above, contact the SESTEK Sales Team. We take care of everything from there - processing, fine-tuning, and delivery. No technical setup required on your end.