Clone Wizard

Clone a voice is the two-step wizard that takes a 3 to 10 second voice sample and matches its tone, accent, and pacing into a new private voice. You reach it from the Create A Voice button in the Voice Picker, the + Create Voice button in the topbar, the empty-state button on the Voice Library, or the picker’s Clone a voice card.

The wizard shell

The Clone Wizard opens as a wide modal that takes most of the screen. The header shows the two steps as a brand-gradient progress bar.

Region	What you see
Step 1 chip	A pill on the start edge of the header with a number 1 (or a check mark once you advance) and the title Step (1) Voice Details.
Step 2 chip	A pill on the end edge of the header with the number 2 and the title Step (2) Generate Custom Template. The pill stays muted until you reach Step 2.
Close button	A small X icon button next to the Step 2 pill. Closes the wizard immediately, your in-progress fields are dropped.

Step 1: Voice Details

The body of Step 1 splits into two columns: the start-edge column carries identity and tags, the end-edge column carries the audio input and transcription.

Cover image (optional)

Element	Notes
Header	The label Cover image.
Drop area	A 110 × 110 px tile with a dashed border. Click to open a file picker. The hint inside reads JPG, JPEG, PNG with Max 5MB below it.
Preview	When an image is uploaded, the tile fills with a cropped preview.
Errors	Image must be JPG, JPEG, or PNG (wrong type) or Image must be 5MB or less (too large). The error appears in red just below the tile.

Identity fields

Field	What you do
Name required	Type a voice name up to 64 characters. Placeholder Enter voice name. The field turns red with the error Name is required when empty after you click Next.
Language required	Click to open a popover with a search box and the macro list. Pick a language. The popover supports search by English name, Arabic name, or language code.
Dialect	Required only when the chosen language has a dialect list. The field is disabled (shows —) for languages without dialects. Pick a dialect from the popover.

Tags (required)

A small section labelled Tags with the hint Select one gender and at least one use case.

Group	Options	Rule
Gender	Male, Female	Pick exactly one. The error reads Gender is required.
Use Cases	A multi-select dropdown with check rows. Use cases come from the catalog (Stories, Social Media, Commercial, Audiobooks, Podcasts, Religious, plus more).	Pick at least one. The button shows the comma-separated list of picks. The error reads Pick at least one use case.

Input audio

The end-edge column starts with a label Input audio and a two-pill switch.

Pill	What it does
Upload	Default. Reveals a dashed-border drop area with a brand-gradient upload icon, the label Add or drop your audio file, and the hint MP3, WAV, WebM, OGG, AAC, M4A, FLAC · Less than 35 MB. Click to open a file picker.
Record	Reveals an inline recorder card with a microphone button.

Upload mode

Drop or pick an audio file. Allowed formats: MP3, WAV, WebM, OGG, AAC, M4A, FLAC. Audio size must be less than 35 MB.

State	What you see
Empty	Dashed-border drop area as above.
Uploaded	Green-tinted card with the file name, size in MB, an inline browser audio player, and a small X button to remove the file.
Type rejected	Red-tinted error Audio file is required (the rejection text).
Too large	Red-tinted error Audio file must be less than 35 MB.

Record mode

Click the brand-gradient microphone button to grant permission and start recording. The label changes during recording.

Phase	Label
Idle	Record your voice. Record 3 to 10 seconds.
Recording	Recording in progress… with a small Click to stop hint and an elapsed-second counter. The browser microphone permission must be granted (same domain), there is no opt-out from the permission prompt.
Stopped, too short	The recording is silently dropped, the recorder returns to idle. The error Audio must be at least 3 seconds appears in the validation strip.
Stopped, valid	A row appears: Audio ready with an inline player and a Click to record new label that resets and prompts a new recording.
Permission refused	The button stays in the idle state with a red-tinted hint that browser permission was refused. Click again to re-prompt the browser.

The recorder caps the recording at 10 seconds even if you do not click stop, the engine throws away anything past the cap.

Transcription

A textarea sits below the audio. Until you upload or record an audio sample, the placeholder reads Transcription will appear here after audio upload and the field is empty. After audio lands, the engine kicks off auto-transcription. While that runs:

State	What you see
Auto-transcribing	A muted strip with a spinner and the label Transcribing…
Auto-done	A pre-filled textarea you can edit. In the demo build, the placeholder text reads This is an auto-transcribed sample. Please review and confirm. (Arabic equivalent in Arabic mode).

Element	Notes
Confirm transcription checkbox	Appears once the textarea has audio + text. Required.
Approve before continuing	Required. The error Please approve the transcription before proceeding shows when you advance with the box unchecked.
Empty transcription error	Transcription is required.

The transcription is what the engine uses to align the cloned voice with the words. If the auto-transcription is wrong, fix it before clicking the Confirm transcription box. Garbage in, garbage out.

A single brand-gradient Next button on the end edge of the footer. Clicking it runs every validation check above. Errors light up the offending fields in red and add a one-line message under each. The wizard does not advance until every required field is satisfied.

Step 2: Generate Custom Template

The body of Step 2 is a single column with a header row, an editable text card, a preview card, and a footer.

Header row

Element	What it does
Title	Generate Custom Template.
Example button	Restores the textarea to the default sample text.
Generate / Regenerate button	Brand-gradient pill with a lightning icon. Reads Generate before the first preview, Regenerate after a preview has rendered. The button is disabled when the word count is below the minimum.

Sample text

Element	Notes
Label	Text.
Default text (English)	Welcome to our voice cloning service! This is an example of how your custom voice will sound. You can use this for various applications like audiobooks, podcasts, or personalized voice assistants. The quality and naturalness of the cloned voice depends on the audio samples you provide.
Default text (Arabic)	مرحباً بك في خدمة استنساخ الأصوات! with the corresponding Arabic body.
Maximum length	1000 characters.
Word count footer	Reads Word count: `{used}/{min}` in mono digits. The chip on the end reads Valid in green when the count is high enough, or Too short in muted text otherwise.
Minimum	5 words.

TTS Voice Preview

State	What you see
Idle (no preview yet)	Title Preview Ready and body Click the ‘Generate’ button above to create an audio sample of your voice. This will help you hear how your custom voice will sound before proceeding.
Loading	Spinner with title Generating preview… and body This can take a few seconds.
Ready	A brand-gradient round play button on the start edge plus a built-in audio player. The play button’s accessible label flips between Play preview and Pause preview.
Failed	Title Preview failed with body Something went wrong. Please try again. and an alert icon. Click Regenerate to try again.

Button	Behaviour
Previous	Returns to Step 1 with all your fields preserved.
Create	Brand-gradient submit. Disabled until a preview has rendered (so you have heard the result). The label flips to Creating… with a spinner during the save. On success a toast confirms with the localized Create word, and the wizard closes. The new voice lands in your Voice Library immediately.

Validation summary

Error message	When it appears
Name is required	Empty name.
Language is required	No language picked.
Dialect is required	The chosen language has dialects but none is picked.
Gender is required	No gender selected.
Pick at least one use case	Use Cases dropdown is empty.
Audio file is required	No audio uploaded or recorded.
Audio file must be less than 35 MB	Upload reaches 35 MB or more.
Audio must be at least 3 seconds	Recording cancelled too early or upload shorter than 3 s.
Audio must be 10 seconds or less	Recording or upload longer than 10 s.
Transcription is required	Transcription textarea is empty.
Please approve the transcription before proceeding	The Confirm transcription box is unchecked.
Image must be 5MB or less	Cover image exceeds 5 MB.
Image must be JPG, JPEG, or PNG	Cover image is in an unsupported format.

Provider and tenant rules

The cloned voice is created against your account. Other users in the same workspace cannot see it. The wizard never shows the underlying provider, the cloned voice surfaces as a Wittify voice everywhere it appears.

The Clone Wizard does not include a separate English Accent field or Arabic Dialect field. The shared Language and Dialect pickers cover both. The same picker is used in the Studio, the agent builder, the Voice Picker, and the Design Wizard.

The Step 2 text is editable but not optional. The preview audio is generated from whatever text is in the textarea at the time you click Generate, so make sure the text is what you want to hear before you generate.

Common questions

Can I close the wizard mid-way and resume later?

No. Closing the wizard discards every field. The Voice Library does not include a saved-draft list. Plan to finish the flow in one go, it is short.

My recording was rejected as too short.

Recordings under 3 seconds are silently dropped to protect the engine from useless input. Aim for 5 to 10 seconds of clear speech.

Why is my cloned voice in Training when I just created it?

The engine takes time to model your sample. The voice card on the Voice Library shows Training until processing finishes, then flips to Ready. You can leave the page and come back.

The auto-transcription is wrong.

Edit the textarea directly to match what you actually said. The transcription is what aligns the cloned voice with your sample, an inaccurate transcription gives an inaccurate clone. Click Confirm transcription only after the text is correct.

My audio file failed with no error.

Browsers can pick up an extension and serve the wrong MIME type. Re-encode the file (Audacity, ffmpeg, even a phone voice memo) to MP3 or WAV and try again.

What is the difference between Clone and Design?

Cloning copies the tone, accent, and pacing of a real voice from a sample. Designing creates a brand-new voice from attributes like gender, age, and pitch without a sample. See the Design Wizard.

Can someone else use my cloned voice?

No. Cloned voices are scoped to your account. Other users in the same workspace cannot see them, and the backend rejects cross-account access.

Can I clone a voice from a YouTube clip or someone else's audio?

You should only clone voices for which you have permission. Re-using a celebrity’s voice or a coworker’s voice without consent is not supported by the product policy.

Where to go next

Voice Library

Find your new voice and manage it.

Studio

Use the cloned voice to generate speech.

Voice Picker

Pick the cloned voice on any voice slot.

Design Wizard

Make a voice without an audio sample.

The wizard shell

Step 1: Voice Details

Cover image (optional)

Identity fields

Tags (required)

Input audio

Upload mode

Record mode

Transcription

Step 1 footer

Step 2: Generate Custom Template

Header row

Sample text

TTS Voice Preview

Step 2 footer

Validation summary

Provider and tenant rules

Common questions

Where to go next

Voice Library

Studio

Voice Picker

Design Wizard

Documentation Index

​The wizard shell

​Step 1: Voice Details

​Cover image (optional)

​Identity fields

​Tags (required)

​Input audio

​Upload mode

​Record mode

​Transcription

​Step 1 footer

​Step 2: Generate Custom Template

​Header row

​Sample text

​TTS Voice Preview

​Step 2 footer

​Validation summary

​Provider and tenant rules

​Common questions

​Where to go next

Voice Library

Studio

Voice Picker

Design Wizard

The wizard shell

Step 1: Voice Details

Cover image (optional)

Identity fields

Tags (required)

Input audio

Upload mode

Record mode

Transcription

Step 1 footer

Step 2: Generate Custom Template

Header row

Sample text

TTS Voice Preview

Step 2 footer

Validation summary

Provider and tenant rules

Common questions

Where to go next