Overview
Voice Cloning allows you to create custom AI voices that match specific tones, accents, or brand identities. Upload audio samples or record your voice, and our system generates a unique AI voice you can use in your agents.Voice Cloning Features:
- Create unlimited custom voices
- Upload audio files or record directly
- Support for multiple languages and dialects
- Preview voices before finalizing
- Full integration with agent voice selection
How Voice Cloning Works
- Provide Audio Sample - Upload or record voice audio
- Configure Voice Details - Name, tags, language, dialect
- Processing - System analyzes and creates AI voice model
- Preview & Test - Generate TTS preview to hear result
- Use in Agents - Select custom voice like any library voice
Voice cloning quality depends on audio sample quality. Clear, noise-free recordings produce the best results.
Creating a Custom Voice
Step 1: Voice Details
Basic Information
Name (Required)- Descriptive name for your voice
- Example: “Customer Service - Sarah”, “Sales - Professional Male”
- Max 100 characters
- Helps identify voice in library
- Additional context about the voice
- Use case or characteristics
- Not visible to end users
- English - For English-speaking markets
- Arabic - For Arabic-speaking markets
Dialect Selection
For Arabic Voices: Choose specific regional dialect:- Egyptian (EG)
- Jordanian (JO)
- Saudi Arabian (SA)
- UAE (AE)
- Gulf
- Levantine
- North African
Selecting the correct dialect ensures natural pronunciation of regional expressions and accents.
Voice Tags (Required)
Select exactly 2 tags - one from each category: Gender:- Male - Masculine voice
- Female - Feminine voice
- Conversational - Natural, friendly tone for dialogues
- Narrator - Clear, articulate tone for announcements
- Help organize your custom voices
- Enable filtering in voice library
- Communicate voice characteristics to team
- Match voices to use cases
Cover Image (Optional)
Upload a visual representation:- Formats: JPG, JPEG, PNG
- Max Size: 5 MB
- Recommended: Professional headshot or brand logo
- Usage: Displays in voice cards
Step 2: Input Audio
Choose how to provide your voice sample:Option A: Upload Audio File
Supported Formats:- MP3
- WAV
- WebM
- OGG
- AAC
- M4A
- FLAC
- Max File Size: 32 MB
- Quality: Clear, noise-free audio
- Duration: Recommended 30-90 seconds
- Content: Natural speech, varied sentences
- Click Upload tab
- Drag and drop audio file or click to browse
- Wait for upload (progress bar shown)
- File validated automatically
- Use high-quality recording equipment
- Record in quiet environment
- Avoid background noise and echo
- Include varied speech patterns
- Speak naturally at normal pace
- Include different sentence types (questions, statements)
Option B: Record Voice
Record directly in the browser: Requirements:- Duration: 3-9 seconds
- Format: WAV (automatic)
- Microphone: Required
- Click Record tab
- Grant microphone permission
- Click Start Recording
- Speak naturally for 3-9 seconds
- Click Stop Recording
- Review recording
- Re-record if needed
- Use good quality microphone
- Minimize background noise
- Speak at natural pace
- Include varied inflection
- Say 2-3 complete sentences
- Don’t rush or speak too slowly
Step 3: Generate TTS Preview
Before finalizing, preview how your custom voice sounds: Preview Process:- Enter sample text (minimum 5 words)
- Click Generate Preview
- System processes voice sample
- Audio preview generates (10-30 seconds)
- Click play to listen
- Regenerate with different text if needed
- Text must contain at least 5 words
- Word count displayed in real-time
- ”✓ Valid” indicator when requirements met
- Generating Preview… - Processing voice sample
- Preview Ready - Audio ready to play
- Preview Failed - Error occurred, try again
Step 4: Create Voice
Once satisfied with the preview:- Click Create button
- Voice processes (usually 30-60 seconds)
- Success message appears
- Voice added to “My Voices” library
- Available immediately in agent voice selection
Voice created successfully! You can now use it in your agents.
Audio Quality Requirements
Recording Environment
Ideal:- Quiet room with minimal echo
- Sound-dampening materials (curtains, furniture)
- Closed windows and doors
- No HVAC or fan noise
- Outdoor recordings
- Rooms with hard surfaces (echo)
- Areas with background conversations
- Near computers or electronics (buzz/hum)
Microphone Selection
Good Options:- USB condenser microphone
- Headset with noise cancellation
- Dedicated podcasting microphone
- Laptop built-in mic (in quiet environment)
- Phone speakerphone
- Far-field microphones
- Low-quality earbuds
- Heavily compressed audio sources
Audio Sample Content
Include variety:- Questions (“How can I help you today?”)
- Statements (“Your order has shipped”)
- Different emotions (friendly, professional, reassuring)
- Various sentence lengths
- Natural pauses and inflection
- Monotone speech
- Reading lists or numbers only
- Repetitive phrases
- Shouting or whispering
- Background music or effects
Managing Custom Voices
Viewing Custom Voices
- Navigate to Voices in sidebar
- Click My Voices tab
- All custom voices display here
- Same features as library voices (preview, favorite, etc.)
Using Custom Voices
Custom voices work identically to library voices: In Single Prompt Agents:- Open agent settings
- Navigate to Voice Settings
- Click Select Voice
- Go to “My Voices” tab
- Select your custom voice
- Available in global voice settings
- Can be used in node-level voice overrides
- Appears in all voice selectors
Deleting Custom Voices
Before deleting:- Remove voice from all agents using it
- Export/save audio sample if you want to recreate later
- Consider deactivating instead of deleting
- Find voice in “My Voices” tab
- Click voice actions menu (⋮)
- Select Delete Voice
- Confirm deletion
- Voice removed from library
- Agents using deleted voice will show error
- Must select new voice for affected agents
- Previous calls with that voice remain in history
Voice Cloning Best Practices
Sample Selection
For customer service voices:- Friendly, helpful tone
- Clear enunciation
- Moderate pace
- Warm inflection
- Confident, enthusiastic
- Engaging energy
- Natural variation
- Professional but personable
- Clear, methodical pace
- Patient tone
- Reassuring demeanor
- Precise pronunciation
Multi-Voice Strategy
Create voice variations for different scenarios: Example: Customer Service DepartmentLanguage and Dialect Matching
For Arabic markets:- Egyptian: Broad Middle East appeal
- Gulf (Saudi, UAE): GCC business markets
- Levantine: Jordan, Syria, Lebanon regions
- Use dialect matching target customer base
- Clear, neutral accent for international
- Regional accents for local businesses
- Professional pronunciation for all markets
Testing Custom Voices
Before deploying:- Preview Testing - Generate multiple TTS previews with different scripts
- Agent Testing - Use in test agent with actual conversation flow
- Team Review - Have colleagues listen and provide feedback
- A/B Testing - Compare with library voices
- Live Testing - Deploy to small percentage of calls first
- Pronunciation is clear and natural
- Pace is appropriate for use case
- Tone matches brand personality
- No robotic or artificial sound
- Handles varied sentence types well
- Emotional range is appropriate
- Consistent quality across different texts
Common Issues
”Recording too short” error
Problem: Recording is less than 3 seconds Solution:- Record longer sample (5-7 seconds recommended)
- Speak 2-3 complete sentences
- Don’t rush through the recording
”Audio file too large” error
Problem: File exceeds 32 MB Solution:- Compress audio file
- Use MP3 format with lower bitrate
- Trim unnecessary silence
- Use online audio compression tool
”Preview generation failed”
Problem: TTS preview won’t generate Possible causes:- Audio quality too low
- Audio sample too short/long
- Server processing issue
- Try uploading different audio sample
- Ensure clean, clear recording
- Check file format is supported
- Try again (temporary issue)
Voice sounds robotic or unnatural
Problem: Generated voice doesn’t sound natural Causes:- Low-quality audio sample
- Background noise in recording
- Insufficient audio variation
- Overly monotone source
- Re-record in quieter environment
- Use better microphone
- Include more natural speech variation
- Speak with natural inflection
Can’t find custom voice in agent
Problem: Created voice doesn’t appear in voice selector Solutions:- Check “My Voices” tab specifically
- Refresh browser page
- Verify voice creation completed successfully
- Check project selection is correct
Voice Cloning Limits
Per Account:- Unlimited custom voices
- 32 MB max file size per upload
- 3-9 seconds for direct recording
- Voice creation: 30-60 seconds
- TTS preview generation: 10-30 seconds
- Custom voices stored permanently
- Cover images: 5 MB max each
Advanced Features
Instant Voice (Beta)
Premium feature for voice isolation and enhancement: Features:- Removes background noise from samples
- Enhances voice clarity
- Improves consistency
- Better quality with less-than-perfect recordings
Instant Voice is a premium beta feature. Contact sales for access.
Voice Versioning
Create multiple versions of same voice: Use case: Update voice without losing original- Create new voice with same base audio
- Use different tags or names to distinguish
- Test new version before switching agents
- Keep old version as backup
Related Documentation
Voice Library
Browse and select from pre-built AI voices
Single Prompt Voice Settings
Configure voice in Single Prompt Agents
Flow Agent Voice Settings
Set up voices in Flow Agents
Testing Agents
Test your custom voice in real calls