Overview
Voice Cloning allows you to create completely custom AI voices that perfectly match your brand identity, specific tone requirements, or unique vocal characteristics. Upload audio samples or record directly to generate an AI voice model that can be used across all your agents.Voice Cloning Features:
- Custom Voice Creation - Build unique voices from audio samples
- Upload or Record - Flexible input options (upload files or record in-browser)
- Multi-Language Support - Create voices in English or Arabic
- Dialect Options - Specify regional accents for Arabic voices
- Instant Preview - Test your voice before finalizing
- Unlimited Voices - Create as many custom voices as you need
Why Use Voice Cloning
Custom voice cloning enables powerful use cases: Brand Consistency:- Create a signature voice that represents your company
- Ensure consistent voice across all customer interactions
- Stand out from competitors using generic AI voices
- Clone authorized voices (CEO, founder, brand ambassador)
- Maintain authentic regional accents
- Preserve specific vocal characteristics
- Match specific tone and style requirements
- Create industry-specific voices (medical, legal, technical)
- Control exact pronunciation and cadence
- Create multiple voices for different departments or scenarios
- Test different voice personalities
- Update and refine voices as your brand evolves
Voice cloning creates an AI model from your audio samples. The quality of your input audio directly impacts the quality of the generated voice.
How Voice Cloning Works
The voice cloning process is straightforward:- Provide Audio Sample - Upload an audio file or record directly
- Configure Details - Set name, language, dialect, tags, and optional cover image
- Generate Preview - Test how your voice sounds with sample text
- Create Voice - Finalize and add to your voice library
- Use in Agents - Select your custom voice like any library voice
Processing Time
- Voice Creation: 30-60 seconds
- Preview Generation: 10-30 seconds
- Availability: Immediate after creation
Creating a Custom Voice
Step 1: Voice Details
Name (Required)- Give your voice a descriptive name
- Examples: “Customer Service - Sarah”, “Sales - Professional Male”
- Max 100 characters
- Helps identify voice in library
- Add context about the voice
- Note use cases or characteristics
- Internal reference only (not visible to customers)
- English - For English-speaking markets
- Arabic - For Arabic-speaking markets
- Egyptian (EG)
- Jordanian (JO)
- Saudi Arabian (SA)
- UAE (AE)
- Gulf
- Levantine
- North African
- Male - Masculine voice
- Female - Feminine voice
- Conversational - Natural, friendly tone
- Narrator - Clear, articulate tone
- Upload a visual representation (JPG, PNG)
- Max size: 5 MB
- Displays on voice card in library
- Professional headshot or brand logo recommended
Step 2: Input Audio
You have two options for providing audio:Option A: Upload Audio File
Supported Formats:- MP3, WAV, WebM, OGG, AAC, M4A, FLAC
- Max file size: 32 MB
- Recommended duration: 30-90 seconds
- Clear, noise-free audio
- Natural speech with varied sentences
- Click “Upload” tab
- Drag and drop file or click to browse
- Wait for upload completion
- File validated automatically
Option B: Record Voice
Record directly in your browser: Requirements:- Duration: 3-9 seconds
- Browser microphone access required
- Format: WAV (automatic)
- Click “Record” tab
- Grant microphone permission
- Click “Start Recording”
- Speak naturally (2-3 sentences)
- Click “Stop Recording”
- Review recording
- Re-record if needed
Step 3: Generate Preview
Test your voice before finalizing:- Enter sample text (minimum 5 words)
- Click “Generate Preview”
- Wait for processing (10-30 seconds)
- Listen to audio preview
- Regenerate with different text if needed
Step 4: Create Voice
Once satisfied with the preview:- Click “Create” button
- Wait for processing (30-60 seconds)
- Voice added to “My Voices” library
- Available immediately in all agents
Custom voice created successfully! Find it in the “My Voices” tab.
Audio Quality Guidelines
Recording Environment
Ideal Environment:- Quiet room with minimal echo
- Closed windows and doors
- No background noise (HVAC, fans, traffic)
- Sound-dampening materials (curtains, furniture)
- Outdoor locations
- Rooms with hard surfaces (echo)
- Areas with background conversations
- Near computers or electronics
Microphone Selection
Good Options:- USB condenser microphone
- Noise-canceling headset
- Dedicated podcasting microphone
- Quality laptop built-in mic (in quiet space)
- Phone speakerphone
- Low-quality earbuds
- Far-field microphones
- Heavily compressed audio sources
Audio Content
Include Variety:- Questions and statements
- Different emotions (friendly, professional, reassuring)
- Various sentence lengths
- Natural pauses and inflection
- Varied pronunciation patterns
- Monotone speech
- Reading lists or numbers only
- Repetitive phrases
- Shouting or whispering
- Background music or sound effects
The quality and variety of your audio sample directly determines the naturalness and versatility of your cloned voice.
Managing Custom Voices
Viewing Custom Voices
- Navigate to “Voices” in sidebar
- Click “My Voices” tab
- All custom voices display here
- Same features as library voices (preview, favorite, filter)
Using Custom Voices
Custom voices work identically to library voices: In Single Prompt Agents:- Open agent Voice Settings
- Click “Select Voice”
- Navigate to “My Voices” tab
- Select your custom voice
- Available in global voice settings
- Can be used in node-level overrides
- Appears in all voice selection menus
Editing Voice Details
Update voice information:- Change voice name
- Update description
- Modify tags
- Replace cover image
Editing voice details does not require re-processing. Changes are instant.
Deleting Custom Voices
Before Deleting:- Remove voice from all agents using it
- Save/export audio sample if you want to recreate later
- Confirm no other team members are using it
- Find voice in “My Voices” tab
- Click voice actions menu (⋮)
- Select “Delete Voice”
- Confirm deletion
- Voice removed permanently
- Agents using deleted voice will show error
- Must select new voice for affected agents
- Previous call recordings remain accessible
Voice Cloning Best Practices
Sample Selection Strategy
For Customer Service:- Friendly, helpful tone
- Clear enunciation
- Moderate, comfortable pace
- Warm, welcoming inflection
- Confident, enthusiastic energy
- Engaging and personable
- Natural variation in pace
- Professional but approachable
- Clear, methodical pace
- Patient, reassuring tone
- Precise pronunciation
- Calm demeanor
- Authoritative, clear delivery
- Professional tone
- Consistent pacing
- Formal style
Multi-Voice Strategy
Create specialized voices for different scenarios: Example: Customer Service DepartmentTesting Custom Voices
Comprehensive Testing Process:- Preview Testing - Generate multiple TTS previews with varied scripts
- Agent Integration - Create test agent with your voice
- Script Testing - Test with actual conversation flows
- Team Review - Get feedback from colleagues
- A/B Testing - Compare with library voices
- Live Pilot - Deploy to small percentage of calls first
- Customer Feedback - Monitor customer reactions
- Pronunciation is clear and natural
- Pace is appropriate for use case
- Tone matches brand personality
- No robotic or artificial qualities
- Handles varied sentence types well
- Emotional range is appropriate
- Consistent quality across different texts
- Regional pronunciation is accurate (if applicable)
Common Issues and Solutions
”Recording too short” Error
Problem: Recording is less than 3 seconds Solutions:- Record longer sample (5-7 seconds recommended)
- Speak 2-3 complete sentences
- Don’t rush through the recording
- Include natural pauses
”Audio file too large” Error
Problem: File exceeds 32 MB limit Solutions:- Compress audio file using audio editor
- Convert to MP3 format with reasonable bitrate
- Trim unnecessary silence at beginning/end
- Use online audio compression tools
”Preview generation failed” Error
Problem: TTS preview won’t generate Possible Causes:- Audio quality too low
- Audio sample too short or too long
- Excessive background noise
- Temporary server processing issue
- Upload different, higher-quality audio sample
- Ensure recording environment is quiet
- Check file format is supported
- Verify file isn’t corrupted
- Try again (may be temporary issue)
Voice Sounds Robotic or Unnatural
Problem: Generated voice lacks natural quality Common Causes:- Poor audio sample quality
- Background noise in recording
- Insufficient vocal variation
- Overly monotone source audio
- Very short sample duration
- Re-record in quieter environment
- Use better quality microphone
- Include more natural speech variation
- Speak with authentic inflection and emotion
- Provide longer audio sample (if using upload)
Can’t Find Custom Voice
Problem: Created voice doesn’t appear in agent settings Solutions:- Check specifically in “My Voices” tab
- Refresh browser page
- Verify voice creation completed successfully
- Confirm you’re in correct project
- Check if voice was accidentally deleted
Use Cases and Examples
Brand Voice Consistency
Scenario: National retail chain Goal: Consistent voice across all locations Solution:- Clone authorized brand representative’s voice
- Create custom voice with approved characteristics
- Use across all agent instances
- Ensure 100% brand consistency
Regional Market Targeting
Scenario: Middle East e-commerce Goal: Connect authentically with GCC customers Solution:- Clone native Gulf Arabic speaker
- Select UAE or Saudi dialect
- Ensure regional pronunciation patterns
- Build trust through authentic accent
Multi-Department Strategy
Scenario: Large enterprise Goal: Different voices for different departments Solution:- Sales: Energetic, engaging voice
- Support: Calm, helpful voice
- Billing: Professional, clear voice
- Executive: Authoritative, trustworthy voice
Legacy Voice Preservation
Scenario: Replacing voice actor Goal: Maintain consistency after personnel change Solution:- Clone original voice actor (with permission)
- Create AI voice model
- Transition seamlessly to AI
- Preserve customer familiarity
Technical Specifications
File Specifications
Upload:- Max size: 32 MB
- Formats: MP3, WAV, WebM, OGG, AAC, M4A, FLAC
- Recommended duration: 30-90 seconds
- Sample rate: 16kHz or higher recommended
- Duration: 3-9 seconds
- Format: WAV (automatic)
- Sample rate: Browser default
- Bitrate: Automatic
Processing Specifications
- Voice creation time: 30-60 seconds
- Preview generation: 10-30 seconds
- Storage: Permanent (until manually deleted)
- Usage: Unlimited across all agents
Limitations
Per Account:- Unlimited custom voices
- 32 MB max file size per upload
- 3-9 seconds for direct recording
- 5 MB max cover image size
Advanced Features
Instant Voice Enhancement (Beta)
Premium feature for improved voice quality: Features:- Automatic background noise removal
- Voice clarity enhancement
- Consistency optimization
- Better results with imperfect recordings
Instant Voice Enhancement is a premium beta feature. Contact sales for access.
Voice Versioning
Maintain multiple versions of the same voice: Use Case: Test improvements without losing original Process:- Create new voice with updated audio sample
- Use naming convention (e.g., “Sales Voice v2”)
- Test new version in parallel
- Switch agents when ready
- Keep old version as backup