Skip to main content

Overview

Voice Cloning allows you to create completely custom AI voices that perfectly match your brand identity, specific tone requirements, or unique vocal characteristics. Upload audio samples or record directly to generate an AI voice model that can be used across all your agents.
Voice Cloning Features:
  • Custom Voice Creation - Build unique voices from audio samples
  • Upload or Record - Flexible input options (upload files or record in-browser)
  • Multi-Language Support - Create voices in English or Arabic
  • Dialect Options - Specify regional accents for Arabic voices
  • Instant Preview - Test your voice before finalizing
  • Unlimited Voices - Create as many custom voices as you need

Why Use Voice Cloning

Custom voice cloning enables powerful use cases: Brand Consistency:
  • Create a signature voice that represents your company
  • Ensure consistent voice across all customer interactions
  • Stand out from competitors using generic AI voices
Authenticity:
  • Clone authorized voices (CEO, founder, brand ambassador)
  • Maintain authentic regional accents
  • Preserve specific vocal characteristics
Professional Quality:
  • Match specific tone and style requirements
  • Create industry-specific voices (medical, legal, technical)
  • Control exact pronunciation and cadence
Flexibility:
  • Create multiple voices for different departments or scenarios
  • Test different voice personalities
  • Update and refine voices as your brand evolves
Voice cloning creates an AI model from your audio samples. The quality of your input audio directly impacts the quality of the generated voice.

How Voice Cloning Works

The voice cloning process is straightforward:
  1. Provide Audio Sample - Upload an audio file or record directly
  2. Configure Details - Set name, language, dialect, tags, and optional cover image
  3. Generate Preview - Test how your voice sounds with sample text
  4. Create Voice - Finalize and add to your voice library
  5. Use in Agents - Select your custom voice like any library voice

Processing Time

  • Voice Creation: 30-60 seconds
  • Preview Generation: 10-30 seconds
  • Availability: Immediate after creation

Creating a Custom Voice

Step 1: Voice Details

Name (Required)
  • Give your voice a descriptive name
  • Examples: “Customer Service - Sarah”, “Sales - Professional Male”
  • Max 100 characters
  • Helps identify voice in library
Description (Optional)
  • Add context about the voice
  • Note use cases or characteristics
  • Internal reference only (not visible to customers)
Language (Required) Choose the primary language:
  • English - For English-speaking markets
  • Arabic - For Arabic-speaking markets
Select the language that matches your actual use case. This affects pronunciation, natural speech patterns, and overall voice quality.
Dialect (Required for Arabic) For Arabic voices, select specific regional dialect:
  • Egyptian (EG)
  • Jordanian (JO)
  • Saudi Arabian (SA)
  • UAE (AE)
  • Gulf
  • Levantine
  • North African
Choosing the correct dialect ensures natural pronunciation and helps your agent connect authentically with your target audience.
Voice Tags (Required) Select exactly 2 tags - one from each category: Gender:
  • Male - Masculine voice
  • Female - Feminine voice
Style:
  • Conversational - Natural, friendly tone
  • Narrator - Clear, articulate tone
Cover Image (Optional)
  • Upload a visual representation (JPG, PNG)
  • Max size: 5 MB
  • Displays on voice card in library
  • Professional headshot or brand logo recommended

Step 2: Input Audio

You have two options for providing audio:

Option A: Upload Audio File

Supported Formats:
  • MP3, WAV, WebM, OGG, AAC, M4A, FLAC
Requirements:
  • Max file size: 32 MB
  • Recommended duration: 30-90 seconds
  • Clear, noise-free audio
  • Natural speech with varied sentences
Upload Process:
  1. Click “Upload” tab
  2. Drag and drop file or click to browse
  3. Wait for upload completion
  4. File validated automatically

Option B: Record Voice

Record directly in your browser: Requirements:
  • Duration: 3-9 seconds
  • Browser microphone access required
  • Format: WAV (automatic)
Recording Process:
  1. Click “Record” tab
  2. Grant microphone permission
  3. Click “Start Recording”
  4. Speak naturally (2-3 sentences)
  5. Click “Stop Recording”
  6. Review recording
  7. Re-record if needed
Recording Length Requirements:
  • Minimum: 3 seconds (validation error if shorter)
  • Maximum: 9 seconds (recording stops automatically)
  • Recommended: 5-7 seconds for best results

Step 3: Generate Preview

Test your voice before finalizing:
  1. Enter sample text (minimum 5 words)
  2. Click “Generate Preview”
  3. Wait for processing (10-30 seconds)
  4. Listen to audio preview
  5. Regenerate with different text if needed
Preview Text Examples:
"Hello, thank you for calling. How can I help you today?"

"Welcome to Acme Corporation. I'm here to assist with any questions."

"Your order has been confirmed and will ship within two business days."
Use text that matches your actual agent scripts to hear how the voice will sound in real conversations.

Step 4: Create Voice

Once satisfied with the preview:
  1. Click “Create” button
  2. Wait for processing (30-60 seconds)
  3. Voice added to “My Voices” library
  4. Available immediately in all agents
Custom voice created successfully! Find it in the “My Voices” tab.

Audio Quality Guidelines

Recording Environment

Ideal Environment:
  • Quiet room with minimal echo
  • Closed windows and doors
  • No background noise (HVAC, fans, traffic)
  • Sound-dampening materials (curtains, furniture)
Avoid:
  • Outdoor locations
  • Rooms with hard surfaces (echo)
  • Areas with background conversations
  • Near computers or electronics

Microphone Selection

Good Options:
  • USB condenser microphone
  • Noise-canceling headset
  • Dedicated podcasting microphone
  • Quality laptop built-in mic (in quiet space)
Poor Options:
  • Phone speakerphone
  • Low-quality earbuds
  • Far-field microphones
  • Heavily compressed audio sources

Audio Content

Include Variety:
  • Questions and statements
  • Different emotions (friendly, professional, reassuring)
  • Various sentence lengths
  • Natural pauses and inflection
  • Varied pronunciation patterns
Avoid:
  • Monotone speech
  • Reading lists or numbers only
  • Repetitive phrases
  • Shouting or whispering
  • Background music or sound effects
The quality and variety of your audio sample directly determines the naturalness and versatility of your cloned voice.

Managing Custom Voices

Viewing Custom Voices

  1. Navigate to “Voices” in sidebar
  2. Click “My Voices” tab
  3. All custom voices display here
  4. Same features as library voices (preview, favorite, filter)

Using Custom Voices

Custom voices work identically to library voices: In Single Prompt Agents:
  1. Open agent Voice Settings
  2. Click “Select Voice”
  3. Navigate to “My Voices” tab
  4. Select your custom voice
In Flow Agents:
  • Available in global voice settings
  • Can be used in node-level overrides
  • Appears in all voice selection menus

Editing Voice Details

Update voice information:
  • Change voice name
  • Update description
  • Modify tags
  • Replace cover image
Editing voice details does not require re-processing. Changes are instant.

Deleting Custom Voices

Deleting a custom voice is permanent and cannot be undone.
Before Deleting:
  • Remove voice from all agents using it
  • Save/export audio sample if you want to recreate later
  • Confirm no other team members are using it
Deletion Process:
  1. Find voice in “My Voices” tab
  2. Click voice actions menu (⋮)
  3. Select “Delete Voice”
  4. Confirm deletion
  5. Voice removed permanently
Impact on Agents:
  • Agents using deleted voice will show error
  • Must select new voice for affected agents
  • Previous call recordings remain accessible

Voice Cloning Best Practices

Sample Selection Strategy

For Customer Service:
  • Friendly, helpful tone
  • Clear enunciation
  • Moderate, comfortable pace
  • Warm, welcoming inflection
For Sales:
  • Confident, enthusiastic energy
  • Engaging and personable
  • Natural variation in pace
  • Professional but approachable
For Technical Support:
  • Clear, methodical pace
  • Patient, reassuring tone
  • Precise pronunciation
  • Calm demeanor
For Announcements:
  • Authoritative, clear delivery
  • Professional tone
  • Consistent pacing
  • Formal style

Multi-Voice Strategy

Create specialized voices for different scenarios: Example: Customer Service Department
Voice 1: "Customer Service - Friendly Female"
- Tags: Female, Conversational
- Use: General inquiries, warm greetings

Voice 2: "Customer Service - Professional Male"
- Tags: Male, Narrator
- Use: Account information, formal communications

Voice 3: "Customer Service - Calm Female"
- Tags: Female, Conversational
- Use: Complaint handling, de-escalation

Testing Custom Voices

Comprehensive Testing Process:
  1. Preview Testing - Generate multiple TTS previews with varied scripts
  2. Agent Integration - Create test agent with your voice
  3. Script Testing - Test with actual conversation flows
  4. Team Review - Get feedback from colleagues
  5. A/B Testing - Compare with library voices
  6. Live Pilot - Deploy to small percentage of calls first
  7. Customer Feedback - Monitor customer reactions
Quality Checklist:
  • Pronunciation is clear and natural
  • Pace is appropriate for use case
  • Tone matches brand personality
  • No robotic or artificial qualities
  • Handles varied sentence types well
  • Emotional range is appropriate
  • Consistent quality across different texts
  • Regional pronunciation is accurate (if applicable)

Common Issues and Solutions

”Recording too short” Error

Problem: Recording is less than 3 seconds Solutions:
  • Record longer sample (5-7 seconds recommended)
  • Speak 2-3 complete sentences
  • Don’t rush through the recording
  • Include natural pauses

”Audio file too large” Error

Problem: File exceeds 32 MB limit Solutions:
  • Compress audio file using audio editor
  • Convert to MP3 format with reasonable bitrate
  • Trim unnecessary silence at beginning/end
  • Use online audio compression tools

”Preview generation failed” Error

Problem: TTS preview won’t generate Possible Causes:
  • Audio quality too low
  • Audio sample too short or too long
  • Excessive background noise
  • Temporary server processing issue
Solutions:
  • Upload different, higher-quality audio sample
  • Ensure recording environment is quiet
  • Check file format is supported
  • Verify file isn’t corrupted
  • Try again (may be temporary issue)

Voice Sounds Robotic or Unnatural

Problem: Generated voice lacks natural quality Common Causes:
  • Poor audio sample quality
  • Background noise in recording
  • Insufficient vocal variation
  • Overly monotone source audio
  • Very short sample duration
Solutions:
  • Re-record in quieter environment
  • Use better quality microphone
  • Include more natural speech variation
  • Speak with authentic inflection and emotion
  • Provide longer audio sample (if using upload)

Can’t Find Custom Voice

Problem: Created voice doesn’t appear in agent settings Solutions:
  • Check specifically in “My Voices” tab
  • Refresh browser page
  • Verify voice creation completed successfully
  • Confirm you’re in correct project
  • Check if voice was accidentally deleted

Use Cases and Examples

Brand Voice Consistency

Scenario: National retail chain Goal: Consistent voice across all locations Solution:
  1. Clone authorized brand representative’s voice
  2. Create custom voice with approved characteristics
  3. Use across all agent instances
  4. Ensure 100% brand consistency

Regional Market Targeting

Scenario: Middle East e-commerce Goal: Connect authentically with GCC customers Solution:
  1. Clone native Gulf Arabic speaker
  2. Select UAE or Saudi dialect
  3. Ensure regional pronunciation patterns
  4. Build trust through authentic accent

Multi-Department Strategy

Scenario: Large enterprise Goal: Different voices for different departments Solution:
  1. Sales: Energetic, engaging voice
  2. Support: Calm, helpful voice
  3. Billing: Professional, clear voice
  4. Executive: Authoritative, trustworthy voice

Legacy Voice Preservation

Scenario: Replacing voice actor Goal: Maintain consistency after personnel change Solution:
  1. Clone original voice actor (with permission)
  2. Create AI voice model
  3. Transition seamlessly to AI
  4. Preserve customer familiarity

Technical Specifications

File Specifications

Upload:
  • Max size: 32 MB
  • Formats: MP3, WAV, WebM, OGG, AAC, M4A, FLAC
  • Recommended duration: 30-90 seconds
  • Sample rate: 16kHz or higher recommended
Recording:
  • Duration: 3-9 seconds
  • Format: WAV (automatic)
  • Sample rate: Browser default
  • Bitrate: Automatic

Processing Specifications

  • Voice creation time: 30-60 seconds
  • Preview generation: 10-30 seconds
  • Storage: Permanent (until manually deleted)
  • Usage: Unlimited across all agents

Limitations

Per Account:
  • Unlimited custom voices
  • 32 MB max file size per upload
  • 3-9 seconds for direct recording
  • 5 MB max cover image size

Advanced Features

Instant Voice Enhancement (Beta)

Premium feature for improved voice quality: Features:
  • Automatic background noise removal
  • Voice clarity enhancement
  • Consistency optimization
  • Better results with imperfect recordings
Instant Voice Enhancement is a premium beta feature. Contact sales for access.

Voice Versioning

Maintain multiple versions of the same voice: Use Case: Test improvements without losing original Process:
  1. Create new voice with updated audio sample
  2. Use naming convention (e.g., “Sales Voice v2”)
  3. Test new version in parallel
  4. Switch agents when ready
  5. Keep old version as backup