Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.wittify.ai/llms.txt

Use this file to discover all available pages before exploring further.

Clone a voice is the two-step wizard that takes a 3 to 10 second voice sample and matches its tone, accent, and pacing into a new private voice. You reach it from the Create A Voice button in the Voice Picker, the + Create Voice button in the topbar, the empty-state button on the Voice Library, or the picker’s Clone a voice card.

The wizard shell

The Clone Wizard opens as a wide modal that takes most of the screen. The header shows the two steps as a brand-gradient progress bar.
RegionWhat you see
Step 1 chipA pill on the start edge of the header with a number 1 (or a check mark once you advance) and the title Step (1) Voice Details.
Step 2 chipA pill on the end edge of the header with the number 2 and the title Step (2) Generate Custom Template. The pill stays muted until you reach Step 2.
Close buttonA small X icon button next to the Step 2 pill. Closes the wizard immediately, your in-progress fields are dropped.

Step 1: Voice Details

The body of Step 1 splits into two columns: the start-edge column carries identity and tags, the end-edge column carries the audio input and transcription.

Cover image (optional)

ElementNotes
HeaderThe label Cover image.
Drop areaA 110 × 110 px tile with a dashed border. Click to open a file picker. The hint inside reads JPG, JPEG, PNG with Max 5MB below it.
PreviewWhen an image is uploaded, the tile fills with a cropped preview.
ErrorsImage must be JPG, JPEG, or PNG (wrong type) or Image must be 5MB or less (too large). The error appears in red just below the tile.

Identity fields

FieldWhat you do
Name requiredType a voice name up to 64 characters. Placeholder Enter voice name. The field turns red with the error Name is required when empty after you click Next.
Language requiredClick to open a popover with a search box and the macro list. Pick a language. The popover supports search by English name, Arabic name, or language code.
DialectRequired only when the chosen language has a dialect list. The field is disabled (shows ) for languages without dialects. Pick a dialect from the popover.

Tags (required)

A small section labelled Tags with the hint Select one gender and at least one use case.
GroupOptionsRule
GenderMale, FemalePick exactly one. The error reads Gender is required.
Use CasesA multi-select dropdown with check rows. Use cases come from the catalog (Stories, Social Media, Commercial, Audiobooks, Podcasts, Religious, plus more).Pick at least one. The button shows the comma-separated list of picks. The error reads Pick at least one use case.

Input audio

The end-edge column starts with a label Input audio and a two-pill switch.
PillWhat it does
UploadDefault. Reveals a dashed-border drop area with a brand-gradient upload icon, the label Add or drop your audio file, and the hint MP3, WAV, WebM, OGG, AAC, M4A, FLAC · Less than 35 MB. Click to open a file picker.
RecordReveals an inline recorder card with a microphone button.

Upload mode

Drop or pick an audio file. Allowed formats: MP3, WAV, WebM, OGG, AAC, M4A, FLAC. Audio size must be less than 35 MB.
StateWhat you see
EmptyDashed-border drop area as above.
UploadedGreen-tinted card with the file name, size in MB, an inline browser audio player, and a small X button to remove the file.
Type rejectedRed-tinted error Audio file is required (the rejection text).
Too largeRed-tinted error Audio file must be less than 35 MB.

Record mode

Click the brand-gradient microphone button to grant permission and start recording. The label changes during recording.
PhaseLabel
IdleRecord your voice. Record 3 to 10 seconds.
RecordingRecording in progress… with a small Click to stop hint and an elapsed-second counter. The browser microphone permission must be granted (same domain), there is no opt-out from the permission prompt.
Stopped, too shortThe recording is silently dropped, the recorder returns to idle. The error Audio must be at least 3 seconds appears in the validation strip.
Stopped, validA row appears: Audio ready with an inline player and a Click to record new label that resets and prompts a new recording.
Permission refusedThe button stays in the idle state with a red-tinted hint that browser permission was refused. Click again to re-prompt the browser.
The recorder caps the recording at 10 seconds even if you do not click stop, the engine throws away anything past the cap.

Transcription

A textarea sits below the audio. Until you upload or record an audio sample, the placeholder reads Transcription will appear here after audio upload and the field is empty. After audio lands, the engine kicks off auto-transcription. While that runs:
StateWhat you see
Auto-transcribingA muted strip with a spinner and the label Transcribing…
Auto-doneA pre-filled textarea you can edit. In the demo build, the placeholder text reads This is an auto-transcribed sample. Please review and confirm. (Arabic equivalent in Arabic mode).
ElementNotes
Confirm transcription checkboxAppears once the textarea has audio + text. Required.
Approve before continuingRequired. The error Please approve the transcription before proceeding shows when you advance with the box unchecked.
Empty transcription errorTranscription is required.
The transcription is what the engine uses to align the cloned voice with the words. If the auto-transcription is wrong, fix it before clicking the Confirm transcription box. Garbage in, garbage out.
A single brand-gradient Next button on the end edge of the footer. Clicking it runs every validation check above. Errors light up the offending fields in red and add a one-line message under each. The wizard does not advance until every required field is satisfied.

Step 2: Generate Custom Template

The body of Step 2 is a single column with a header row, an editable text card, a preview card, and a footer.

Header row

ElementWhat it does
TitleGenerate Custom Template.
Example buttonRestores the textarea to the default sample text.
Generate / Regenerate buttonBrand-gradient pill with a lightning icon. Reads Generate before the first preview, Regenerate after a preview has rendered. The button is disabled when the word count is below the minimum.

Sample text

ElementNotes
LabelText.
Default text (English)Welcome to our voice cloning service! This is an example of how your custom voice will sound. You can use this for various applications like audiobooks, podcasts, or personalized voice assistants. The quality and naturalness of the cloned voice depends on the audio samples you provide.
Default text (Arabic)مرحباً بك في خدمة استنساخ الأصوات! with the corresponding Arabic body.
Maximum length1000 characters.
Word count footerReads Word count: {used}/{min} in mono digits. The chip on the end reads Valid in green when the count is high enough, or Too short in muted text otherwise.
Minimum5 words.

TTS Voice Preview

StateWhat you see
Idle (no preview yet)Title Preview Ready and body Click the ‘Generate’ button above to create an audio sample of your voice. This will help you hear how your custom voice will sound before proceeding.
LoadingSpinner with title Generating preview… and body This can take a few seconds.
ReadyA brand-gradient round play button on the start edge plus a built-in audio player. The play button’s accessible label flips between Play preview and Pause preview.
FailedTitle Preview failed with body Something went wrong. Please try again. and an alert icon. Click Regenerate to try again.
ButtonBehaviour
PreviousReturns to Step 1 with all your fields preserved.
CreateBrand-gradient submit. Disabled until a preview has rendered (so you have heard the result). The label flips to Creating… with a spinner during the save. On success a toast confirms with the localized Create word, and the wizard closes. The new voice lands in your Voice Library immediately.

Validation summary

Error messageWhen it appears
Name is requiredEmpty name.
Language is requiredNo language picked.
Dialect is requiredThe chosen language has dialects but none is picked.
Gender is requiredNo gender selected.
Pick at least one use caseUse Cases dropdown is empty.
Audio file is requiredNo audio uploaded or recorded.
Audio file must be less than 35 MBUpload reaches 35 MB or more.
Audio must be at least 3 secondsRecording cancelled too early or upload shorter than 3 s.
Audio must be 10 seconds or lessRecording or upload longer than 10 s.
Transcription is requiredTranscription textarea is empty.
Please approve the transcription before proceedingThe Confirm transcription box is unchecked.
Image must be 5MB or lessCover image exceeds 5 MB.
Image must be JPG, JPEG, or PNGCover image is in an unsupported format.

Provider and tenant rules

The cloned voice is created against your account. Other users in the same workspace cannot see it. The wizard never shows the underlying provider, the cloned voice surfaces as a Wittify voice everywhere it appears.
The Clone Wizard does not include a separate English Accent field or Arabic Dialect field. The shared Language and Dialect pickers cover both. The same picker is used in the Studio, the agent builder, the Voice Picker, and the Design Wizard.
The Step 2 text is editable but not optional. The preview audio is generated from whatever text is in the textarea at the time you click Generate, so make sure the text is what you want to hear before you generate.

Common questions

No. Closing the wizard discards every field. The Voice Library does not include a saved-draft list. Plan to finish the flow in one go, it is short.
Recordings under 3 seconds are silently dropped to protect the engine from useless input. Aim for 5 to 10 seconds of clear speech.
The engine takes time to model your sample. The voice card on the Voice Library shows Training until processing finishes, then flips to Ready. You can leave the page and come back.
Edit the textarea directly to match what you actually said. The transcription is what aligns the cloned voice with your sample, an inaccurate transcription gives an inaccurate clone. Click Confirm transcription only after the text is correct.
Browsers can pick up an extension and serve the wrong MIME type. Re-encode the file (Audacity, ffmpeg, even a phone voice memo) to MP3 or WAV and try again.
Cloning copies the tone, accent, and pacing of a real voice from a sample. Designing creates a brand-new voice from attributes like gender, age, and pitch without a sample. See the Design Wizard.
No. Cloned voices are scoped to your account. Other users in the same workspace cannot see them, and the backend rejects cross-account access.
You should only clone voices for which you have permission. Re-using a celebrity’s voice or a coworker’s voice without consent is not supported by the product policy.

Where to go next

Voice Library

Find your new voice and manage it.

Studio

Use the cloned voice to generate speech.

Voice Picker

Pick the cloned voice on any voice slot.

Design Wizard

Make a voice without an audio sample.