Overview
Voice Cloning allows you to create custom AI voices that match specific tones, accents, or brand identities. Upload audio samples or record your voice, and our system generates a unique AI voice you can use in your agents.Voice Cloning Features:
- Create unlimited custom voices
- Upload audio files or record directly
- Support for multiple languages and dialects
- Preview voices before finalizing
- Full integration with agent voice selection
How Voice Cloning Works
- Provide Audio Sample - Upload or record voice audio
- Configure Voice Details - Name, tags, language, dialect
- Processing - System analyzes and creates AI voice model
- Preview & Test - Generate TTS preview to hear result
- Use in Agents - Select custom voice like any library voice
Voice cloning quality depends on audio sample quality. Clear, noise-free recordings produce the best results.
Creating a Custom Voice
Step 1: Voice Details
Basic Information
Name (Required)- Descriptive name for your voice
- Example: “Customer Service - Sarah”, “Sales - Professional Male”
- Max 100 characters
- Helps identify voice in library
- Additional context about the voice
- Use case or characteristics
- Not visible to end users
- English - For English-speaking markets
- Arabic - For Arabic-speaking markets
Dialect Selection
For Arabic Voices: Choose specific regional dialect:- Egyptian (EG)
- Jordanian (JO)
- Saudi Arabian (SA)
- UAE (AE)
- Gulf
- Levantine
- North African
Selecting the correct dialect ensures natural pronunciation of regional expressions and accents.
Voice Tags (Required)
Select exactly 2 tags - one from each category: Gender:- Male - Masculine voice
- Female - Feminine voice
- Conversational - Natural, friendly tone for dialogues
- Narrator - Clear, articulate tone for announcements
- Help organize your custom voices
- Enable filtering in voice library
- Communicate voice characteristics to team
- Match voices to use cases
Cover Image (Optional)
Upload a visual representation:- Formats: JPG, JPEG, PNG
- Max Size: 5 MB
- Recommended: Professional headshot or brand logo
- Usage: Displays in voice cards
Step 2: Input Audio
Choose how to provide your voice sample:Option A: Upload Audio File
Supported Formats:- MP3
- WAV
- WebM
- OGG
- AAC
- M4A
- FLAC
- Max File Size: 32 MB
- Quality: Clear, noise-free audio
- Duration: Recommended 30-90 seconds
- Content: Natural speech, varied sentences
- Click Upload tab
- Drag and drop audio file or click to browse
- Wait for upload (progress bar shown)
- File validated automatically
- Use high-quality recording equipment
- Record in quiet environment
- Avoid background noise and echo
- Include varied speech patterns
- Speak naturally at normal pace
- Include different sentence types (questions, statements)
Option B: Record Voice
Record directly in the browser: Requirements:- Duration: 3-9 seconds
- Format: WAV (automatic)
- Microphone: Required
- Click Record tab
- Grant microphone permission
- Click Start Recording
- Speak naturally for 3-9 seconds
- Click Stop Recording
- Review recording
- Re-record if needed
- Use good quality microphone
- Minimize background noise
- Speak at natural pace
- Include varied inflection
- Say 2-3 complete sentences
- Don’t rush or speak too slowly
Step 3: Generate TTS Preview
Before finalizing, preview how your custom voice sounds: Preview Process:- Enter sample text (minimum 5 words)
- Click Generate Preview
- System processes voice sample
- Audio preview generates (10-30 seconds)
- Click play to listen
- Regenerate with different text if needed
- Text must contain at least 5 words
- Word count displayed in real-time
- ”✓ Valid” indicator when requirements met
- Generating Preview… - Processing voice sample
- Preview Ready - Audio ready to play
- Preview Failed - Error occurred, try again
Step 4: Create Voice
Once satisfied with the preview:- Click Create button
- Voice processes (usually 30-60 seconds)
- Success message appears
- Voice added to “My Voices” library
- Available immediately in agent voice selection
Voice created successfully! You can now use it in your agents.
Audio Quality Requirements
Recording Environment
Ideal:- Quiet room with minimal echo
- Sound-dampening materials (curtains, furniture)
- Closed windows and doors
- No HVAC or fan noise
- Outdoor recordings
- Rooms with hard surfaces (echo)
- Areas with background conversations
- Near computers or electronics (buzz/hum)
Microphone Selection
Good Options:- USB condenser microphone
- Headset with noise cancellation
- Dedicated podcasting microphone
- Laptop built-in mic (in quiet environment)
- Phone speakerphone
- Far-field microphones
- Low-quality earbuds
- Heavily compressed audio sources
Audio Sample Content
Include variety:- Questions (“How can I help you today?”)
- Statements (“Your order has shipped”)
- Different emotions (friendly, professional, reassuring)
- Various sentence lengths
- Natural pauses and inflection
- Monotone speech
- Reading lists or numbers only
- Repetitive phrases
- Shouting or whispering
- Background music or effects
Managing Custom Voices
Viewing Custom Voices
- Navigate to Voices in sidebar
- Click My Voices tab
- All custom voices display here
- Same features as library voices (preview, favorite, etc.)
Using Custom Voices
Custom voices work identically to library voices: In Single Prompt Agents:- Open agent settings
- Navigate to Voice Settings
- Click Select Voice
- Go to “My Voices” tab
- Select your custom voice
- Available in global voice settings
- Can be used in node-level voice overrides
- Appears in all voice selectors
Deleting Custom Voices
Before deleting:- Remove voice from all agents using it
- Export/save audio sample if you want to recreate later
- Consider deactivating instead of deleting
- Find voice in “My Voices” tab
- Click voice actions menu (⋮)
- Select Delete Voice
- Confirm deletion
- Voice removed from library
- Agents using deleted voice will show error
- Must select new voice for affected agents
- Previous calls with that voice remain in history
Voice Cloning Best Practices
Sample Selection
For customer service voices:- Friendly, helpful tone
- Clear enunciation
- Moderate pace
- Warm inflection
- Confident, enthusiastic
- Engaging energy
- Natural variation
- Professional but personable
- Clear, methodical pace
- Patient tone
- Reassuring demeanor
- Precise pronunciation
Multi-Voice Strategy
Create voice variations for different scenarios: Example: Customer Service DepartmentLanguage and Dialect Matching
For Arabic markets:- Egyptian: Broad Middle East appeal
- Gulf (Saudi, UAE): GCC business markets
- Levantine: Jordan, Syria, Lebanon regions
- Use dialect matching target customer base
- Clear, neutral accent for international
- Regional accents for local businesses
- Professional pronunciation for all markets
Testing Custom Voices
Before deploying:- Preview Testing - Generate multiple TTS previews with different scripts
- Agent Testing - Use in test agent with actual conversation flow
- Team Review - Have colleagues listen and provide feedback
- A/B Testing - Compare with library voices
- Live Testing - Deploy to small percentage of calls first
- Pronunciation is clear and natural
- Pace is appropriate for use case
- Tone matches brand personality
- No robotic or artificial sound
- Handles varied sentence types well
- Emotional range is appropriate
- Consistent quality across different texts
Common Issues
”Recording too short” error
Problem: Recording is less than 3 seconds Solution:- Record longer sample (5-7 seconds recommended)
- Speak 2-3 complete sentences
- Don’t rush through the recording
”Audio file too large” error
Problem: File exceeds 32 MB Solution:- Compress audio file
- Use MP3 format with lower bitrate
- Trim unnecessary silence
- Use online audio compression tool
”Preview generation failed”
Problem: TTS preview won’t generate Possible causes:- Audio quality too low
- Audio sample too short/long
- Server processing issue
- Try uploading different audio sample
- Ensure clean, clear recording
- Check file format is supported
- Try again (temporary issue)
Voice sounds robotic or unnatural
Problem: Generated voice doesn’t sound natural Causes:- Low-quality audio sample
- Background noise in recording
- Insufficient audio variation
- Overly monotone source
- Re-record in quieter environment
- Use better microphone
- Include more natural speech variation
- Speak with natural inflection
Can’t find custom voice in agent
Problem: Created voice doesn’t appear in voice selector Solutions:- Check “My Voices” tab specifically
- Refresh browser page
- Verify voice creation completed successfully
- Check project selection is correct
Voice Cloning Limits
Per Account:- Unlimited custom voices
- 32 MB max file size per upload
- 3-9 seconds for direct recording
- Voice creation: 30-60 seconds
- TTS preview generation: 10-30 seconds
- Custom voices stored permanently
- Cover images: 5 MB max each
Advanced Features
Instant Voice (Beta)
Premium feature for voice isolation and enhancement: Features:- Removes background noise from samples
- Enhances voice clarity
- Improves consistency
- Better quality with less-than-perfect recordings
Instant Voice is a premium beta feature. Contact sales for access.
Voice Versioning
Create multiple versions of same voice: Use case: Update voice without losing original- Create new voice with same base audio
- Use different tags or names to distinguish
- Test new version before switching agents
- Keep old version as backup