Documentation Index
Fetch the complete documentation index at: https://docs.wittify.ai/llms.txt
Use this file to discover all available pages before exploring further.
Clone a voice is the two-step wizard that takes a 3 to 10 second voice sample and matches its tone, accent, and pacing into a new private voice. You reach it from the Create A Voice button in the Voice Picker, the + Create Voice button in the topbar, the empty-state button on the Voice Library, or the picker’s Clone a voice card.
The wizard shell
The Clone Wizard opens as a wide modal that takes most of the screen. The header shows the two steps as a brand-gradient progress bar.| Region | What you see |
|---|---|
| Step 1 chip | A pill on the start edge of the header with a number 1 (or a check mark once you advance) and the title Step (1) Voice Details. |
| Step 2 chip | A pill on the end edge of the header with the number 2 and the title Step (2) Generate Custom Template. The pill stays muted until you reach Step 2. |
| Close button | A small X icon button next to the Step 2 pill. Closes the wizard immediately, your in-progress fields are dropped. |
Step 1: Voice Details
The body of Step 1 splits into two columns: the start-edge column carries identity and tags, the end-edge column carries the audio input and transcription.Cover image (optional)
| Element | Notes |
|---|---|
| Header | The label Cover image. |
| Drop area | A 110 × 110 px tile with a dashed border. Click to open a file picker. The hint inside reads JPG, JPEG, PNG with Max 5MB below it. |
| Preview | When an image is uploaded, the tile fills with a cropped preview. |
| Errors | Image must be JPG, JPEG, or PNG (wrong type) or Image must be 5MB or less (too large). The error appears in red just below the tile. |
Identity fields
| Field | What you do |
|---|---|
| Name required | Type a voice name up to 64 characters. Placeholder Enter voice name. The field turns red with the error Name is required when empty after you click Next. |
| Language required | Click to open a popover with a search box and the macro list. Pick a language. The popover supports search by English name, Arabic name, or language code. |
| Dialect | Required only when the chosen language has a dialect list. The field is disabled (shows —) for languages without dialects. Pick a dialect from the popover. |
Tags (required)
A small section labelled Tags with the hint Select one gender and at least one use case.| Group | Options | Rule |
|---|---|---|
| Gender | Male, Female | Pick exactly one. The error reads Gender is required. |
| Use Cases | A multi-select dropdown with check rows. Use cases come from the catalog (Stories, Social Media, Commercial, Audiobooks, Podcasts, Religious, plus more). | Pick at least one. The button shows the comma-separated list of picks. The error reads Pick at least one use case. |
Input audio
The end-edge column starts with a label Input audio and a two-pill switch.| Pill | What it does |
|---|---|
| Upload | Default. Reveals a dashed-border drop area with a brand-gradient upload icon, the label Add or drop your audio file, and the hint MP3, WAV, WebM, OGG, AAC, M4A, FLAC · Less than 35 MB. Click to open a file picker. |
| Record | Reveals an inline recorder card with a microphone button. |
Upload mode
Drop or pick an audio file. Allowed formats: MP3, WAV, WebM, OGG, AAC, M4A, FLAC. Audio size must be less than 35 MB.| State | What you see |
|---|---|
| Empty | Dashed-border drop area as above. |
| Uploaded | Green-tinted card with the file name, size in MB, an inline browser audio player, and a small X button to remove the file. |
| Type rejected | Red-tinted error Audio file is required (the rejection text). |
| Too large | Red-tinted error Audio file must be less than 35 MB. |
Record mode
Click the brand-gradient microphone button to grant permission and start recording. The label changes during recording.| Phase | Label |
|---|---|
| Idle | Record your voice. Record 3 to 10 seconds. |
| Recording | Recording in progress… with a small Click to stop hint and an elapsed-second counter. The browser microphone permission must be granted (same domain), there is no opt-out from the permission prompt. |
| Stopped, too short | The recording is silently dropped, the recorder returns to idle. The error Audio must be at least 3 seconds appears in the validation strip. |
| Stopped, valid | A row appears: Audio ready with an inline player and a Click to record new label that resets and prompts a new recording. |
| Permission refused | The button stays in the idle state with a red-tinted hint that browser permission was refused. Click again to re-prompt the browser. |
Transcription
A textarea sits below the audio. Until you upload or record an audio sample, the placeholder reads Transcription will appear here after audio upload and the field is empty. After audio lands, the engine kicks off auto-transcription. While that runs:| State | What you see |
|---|---|
| Auto-transcribing | A muted strip with a spinner and the label Transcribing… |
| Auto-done | A pre-filled textarea you can edit. In the demo build, the placeholder text reads This is an auto-transcribed sample. Please review and confirm. (Arabic equivalent in Arabic mode). |
| Element | Notes |
|---|---|
| Confirm transcription checkbox | Appears once the textarea has audio + text. Required. |
| Approve before continuing | Required. The error Please approve the transcription before proceeding shows when you advance with the box unchecked. |
| Empty transcription error | Transcription is required. |
Step 1 footer
A single brand-gradient Next button on the end edge of the footer. Clicking it runs every validation check above. Errors light up the offending fields in red and add a one-line message under each. The wizard does not advance until every required field is satisfied.Step 2: Generate Custom Template
The body of Step 2 is a single column with a header row, an editable text card, a preview card, and a footer.Header row
| Element | What it does |
|---|---|
| Title | Generate Custom Template. |
| Example button | Restores the textarea to the default sample text. |
| Generate / Regenerate button | Brand-gradient pill with a lightning icon. Reads Generate before the first preview, Regenerate after a preview has rendered. The button is disabled when the word count is below the minimum. |
Sample text
| Element | Notes |
|---|---|
| Label | Text. |
| Default text (English) | Welcome to our voice cloning service! This is an example of how your custom voice will sound. You can use this for various applications like audiobooks, podcasts, or personalized voice assistants. The quality and naturalness of the cloned voice depends on the audio samples you provide. |
| Default text (Arabic) | مرحباً بك في خدمة استنساخ الأصوات! with the corresponding Arabic body. |
| Maximum length | 1000 characters. |
| Word count footer | Reads Word count: {used}/{min} in mono digits. The chip on the end reads Valid in green when the count is high enough, or Too short in muted text otherwise. |
| Minimum | 5 words. |
TTS Voice Preview
| State | What you see |
|---|---|
| Idle (no preview yet) | Title Preview Ready and body Click the ‘Generate’ button above to create an audio sample of your voice. This will help you hear how your custom voice will sound before proceeding. |
| Loading | Spinner with title Generating preview… and body This can take a few seconds. |
| Ready | A brand-gradient round play button on the start edge plus a built-in audio player. The play button’s accessible label flips between Play preview and Pause preview. |
| Failed | Title Preview failed with body Something went wrong. Please try again. and an alert icon. Click Regenerate to try again. |
Step 2 footer
| Button | Behaviour |
|---|---|
| Previous | Returns to Step 1 with all your fields preserved. |
| Create | Brand-gradient submit. Disabled until a preview has rendered (so you have heard the result). The label flips to Creating… with a spinner during the save. On success a toast confirms with the localized Create word, and the wizard closes. The new voice lands in your Voice Library immediately. |
Validation summary
| Error message | When it appears |
|---|---|
| Name is required | Empty name. |
| Language is required | No language picked. |
| Dialect is required | The chosen language has dialects but none is picked. |
| Gender is required | No gender selected. |
| Pick at least one use case | Use Cases dropdown is empty. |
| Audio file is required | No audio uploaded or recorded. |
| Audio file must be less than 35 MB | Upload reaches 35 MB or more. |
| Audio must be at least 3 seconds | Recording cancelled too early or upload shorter than 3 s. |
| Audio must be 10 seconds or less | Recording or upload longer than 10 s. |
| Transcription is required | Transcription textarea is empty. |
| Please approve the transcription before proceeding | The Confirm transcription box is unchecked. |
| Image must be 5MB or less | Cover image exceeds 5 MB. |
| Image must be JPG, JPEG, or PNG | Cover image is in an unsupported format. |
Provider and tenant rules
The cloned voice is created against your account. Other users in the same workspace cannot see it. The wizard never shows the underlying provider, the cloned voice surfaces as a Wittify voice everywhere it appears.Common questions
Can I close the wizard mid-way and resume later?
Can I close the wizard mid-way and resume later?
No. Closing the wizard discards every field. The Voice Library does not include a saved-draft list. Plan to finish the flow in one go, it is short.
My recording was rejected as too short.
My recording was rejected as too short.
Recordings under 3 seconds are silently dropped to protect the engine from useless input. Aim for 5 to 10 seconds of clear speech.
Why is my cloned voice in Training when I just created it?
Why is my cloned voice in Training when I just created it?
The engine takes time to model your sample. The voice card on the Voice Library shows Training until processing finishes, then flips to Ready. You can leave the page and come back.
The auto-transcription is wrong.
The auto-transcription is wrong.
Edit the textarea directly to match what you actually said. The transcription is what aligns the cloned voice with your sample, an inaccurate transcription gives an inaccurate clone. Click Confirm transcription only after the text is correct.
My audio file failed with no error.
My audio file failed with no error.
Browsers can pick up an extension and serve the wrong MIME type. Re-encode the file (Audacity, ffmpeg, even a phone voice memo) to MP3 or WAV and try again.
What is the difference between Clone and Design?
What is the difference between Clone and Design?
Cloning copies the tone, accent, and pacing of a real voice from a sample. Designing creates a brand-new voice from attributes like gender, age, and pitch without a sample. See the Design Wizard.
Can someone else use my cloned voice?
Can someone else use my cloned voice?
No. Cloned voices are scoped to your account. Other users in the same workspace cannot see them, and the backend rejects cross-account access.
Can I clone a voice from a YouTube clip or someone else's audio?
Can I clone a voice from a YouTube clip or someone else's audio?
You should only clone voices for which you have permission. Re-using a celebrity’s voice or a coworker’s voice without consent is not supported by the product policy.
Where to go next
Voice Library
Find your new voice and manage it.
Studio
Use the cloned voice to generate speech.
Voice Picker
Pick the cloned voice on any voice slot.
Design Wizard
Make a voice without an audio sample.

