Voice Cloning

Prev Next

SESTEK makes voice cloning simple and accessible. Whether you are working on a product, exploring what's possible, or need a branded voice for your application - the process is straightforward. Follow the steps below, and we handle the rest.

Voice cloning is not yet available via API. To request a cloned voice, contact the SESTEK sales team directly.


How It Works

tts-voice-cloning.png


Recording Requirements

To produce a high-quality clone, your recording should meet the following criteria:

  • Record in a studio - use a quiet, professional recording environment. Avoid recording on phones to prevent unwanted noise or static.
  • Sample rate - 48 kHz for best results.
  • Speak naturally - be dynamic, avoid monotony. The clone will reflect the energy and character of your recording.
  • Script - you can read any plain text of your choice. No specific script is required, just ensure the content is clear and varied.

Recording Duration

Duration What to expect
1–2 minutes A quick trial clone - gives a general idea of the cloned voice quality
20–30 minutes Recommended for best cloning performance and production-ready output

Finding 20–30 minutes of clean studio recording can be challenging. A shorter recording of 1–2 minutes is enough to evaluate the results before committing to a full session.


Getting Started

Once your recording meets the criteria above, contact the SESTEK Sales Team. We take care of everything from there - processing, fine-tuning, and delivery. No technical setup required on your end.

Your cloned voice will be delivered ready to use.