Voice Cloning - Hamsa API

Overview

Voice Cloning allows you to create completely custom AI voices that perfectly match your brand identity, specific tone requirements, or unique vocal characteristics. Upload audio samples or record directly to generate an AI voice model that can be used across all your agents.

Voice Cloning Features:

Custom Voice Creation - Build unique voices from audio samples
Upload or Record - Flexible input options (upload files or record in-browser)
Multi-Language Support - Create voices in English or Arabic
Dialect Options - Specify regional accents for Arabic voices
Instant Preview - Test your voice before finalizing
Unlimited Voices - Create as many custom voices as you need

Why Use Voice Cloning

Custom voice cloning enables powerful use cases: Brand Consistency:

Create a signature voice that represents your company
Ensure consistent voice across all customer interactions
Stand out from competitors using generic AI voices

Authenticity:

Clone authorized voices (CEO, founder, brand ambassador)
Maintain authentic regional accents
Preserve specific vocal characteristics

Professional Quality:

Match specific tone and style requirements
Create industry-specific voices (medical, legal, technical)
Control exact pronunciation and cadence

Flexibility:

Create multiple voices for different departments or scenarios
Test different voice personalities
Update and refine voices as your brand evolves

Voice cloning creates an AI model from your audio samples. The quality of your input audio directly impacts the quality of the generated voice.

How Voice Cloning Works

The voice cloning process is straightforward:

Provide Audio Sample - Upload an audio file or record directly
Configure Details - Set name, language, dialect, tags, and optional cover image
Generate Preview - Test how your voice sounds with sample text
Create Voice - Finalize and add to your voice library
Use in Agents - Select your custom voice like any library voice

Processing Time

Voice Creation: 30-60 seconds
Preview Generation: 10-30 seconds
Availability: Immediate after creation

Creating a Custom Voice

Step 1: Voice Details

Name (Required)

Give your voice a descriptive name
Examples: “Customer Service - Sarah”, “Sales - Professional Male”
Max 100 characters
Helps identify voice in library

Description (Optional)

Add context about the voice
Note use cases or characteristics
Internal reference only (not visible to customers)

Language (Required) Choose the primary language:

English - For English-speaking markets
Arabic - For Arabic-speaking markets

Select the language that matches your actual use case. This affects pronunciation, natural speech patterns, and overall voice quality.

Dialect (Required for Arabic) For Arabic voices, select specific regional dialect:

Egyptian (EG)
Jordanian (JO)
Saudi Arabian (SA)
UAE (AE)
Gulf
Levantine
North African

Choosing the correct dialect ensures natural pronunciation and helps your agent connect authentically with your target audience.

Voice Tags (Required) Select exactly 2 tags - one from each category: Gender:

Male - Masculine voice
Female - Feminine voice

Style:

Conversational - Natural, friendly tone
Narrator - Clear, articulate tone

Cover Image (Optional)

Upload a visual representation (JPG, PNG)
Max size: 5 MB
Displays on voice card in library
Professional headshot or brand logo recommended

Step 2: Input Audio

You have two options for providing audio:

Option A: Upload Audio File

Supported Formats:

MP3, WAV, WebM, OGG, AAC, M4A, FLAC

Requirements:

Max file size: 32 MB
Recommended duration: 30-90 seconds
Clear, noise-free audio
Natural speech with varied sentences

Upload Process:

Click “Upload” tab
Drag and drop file or click to browse
Wait for upload completion
File validated automatically

Option B: Record Voice

Record directly in your browser: Requirements:

Duration: 3-9 seconds
Browser microphone access required
Format: WAV (automatic)

Recording Process:

Click “Record” tab
Grant microphone permission
Click “Start Recording”
Speak naturally (2-3 sentences)
Click “Stop Recording”
Review recording
Re-record if needed

Recording Length Requirements:

Minimum: 3 seconds (validation error if shorter)
Maximum: 9 seconds (recording stops automatically)
Recommended: 5-7 seconds for best results

Step 3: Generate Preview

Test your voice before finalizing:

Enter sample text (minimum 5 words)
Click “Generate Preview”
Wait for processing (10-30 seconds)
Listen to audio preview
Regenerate with different text if needed

Preview Text Examples:

"Hello, thank you for calling. How can I help you today?"

"Welcome to Acme Corporation. I'm here to assist with any questions."

"Your order has been confirmed and will ship within two business days."

Use text that matches your actual agent scripts to hear how the voice will sound in real conversations.

Step 4: Create Voice

Once satisfied with the preview:

Click “Create” button
Wait for processing (30-60 seconds)
Voice added to “My Voices” library
Available immediately in all agents

Custom voice created successfully! Find it in the “My Voices” tab.

Audio Quality Guidelines

Recording Environment

Ideal Environment:

Quiet room with minimal echo
Closed windows and doors
No background noise (HVAC, fans, traffic)
Sound-dampening materials (curtains, furniture)

Avoid:

Outdoor locations
Rooms with hard surfaces (echo)
Areas with background conversations
Near computers or electronics

Microphone Selection

Good Options:

USB condenser microphone
Noise-canceling headset
Dedicated podcasting microphone
Quality laptop built-in mic (in quiet space)

Poor Options:

Phone speakerphone
Low-quality earbuds
Far-field microphones
Heavily compressed audio sources

Audio Content

Include Variety:

Questions and statements
Different emotions (friendly, professional, reassuring)
Various sentence lengths
Natural pauses and inflection
Varied pronunciation patterns

Avoid:

Monotone speech
Reading lists or numbers only
Repetitive phrases
Shouting or whispering
Background music or sound effects

The quality and variety of your audio sample directly determines the naturalness and versatility of your cloned voice.

Managing Custom Voices

Viewing Custom Voices

Navigate to “Voices” in sidebar
Click “My Voices” tab
All custom voices display here
Same features as library voices (preview, favorite, filter)

Using Custom Voices

Custom voices work identically to library voices: In Single Prompt Agents:

Open agent Voice Settings
Click “Select Voice”
Navigate to “My Voices” tab
Select your custom voice

In Flow Agents:

Available in global voice settings
Can be used in node-level overrides
Appears in all voice selection menus

Editing Voice Details

Update voice information:

Change voice name
Update description
Modify tags
Replace cover image

Editing voice details does not require re-processing. Changes are instant.

Deleting Custom Voices

Deleting a custom voice is permanent and cannot be undone.

Before Deleting:

Remove voice from all agents using it
Save/export audio sample if you want to recreate later
Confirm no other team members are using it

Deletion Process:

Find voice in “My Voices” tab
Click voice actions menu (⋮)
Select “Delete Voice”
Confirm deletion
Voice removed permanently

Impact on Agents:

Agents using deleted voice will show error
Must select new voice for affected agents
Previous call recordings remain accessible

Voice Cloning Best Practices

Sample Selection Strategy

For Customer Service:

Friendly, helpful tone
Clear enunciation
Moderate, comfortable pace
Warm, welcoming inflection

For Sales:

Confident, enthusiastic energy
Engaging and personable
Natural variation in pace
Professional but approachable

For Technical Support:

Clear, methodical pace
Patient, reassuring tone
Precise pronunciation
Calm demeanor

For Announcements:

Authoritative, clear delivery
Professional tone
Consistent pacing
Formal style

Multi-Voice Strategy

Create specialized voices for different scenarios: Example: Customer Service Department

Voice 1: "Customer Service - Friendly Female"
- Tags: Female, Conversational
- Use: General inquiries, warm greetings

Voice 2: "Customer Service - Professional Male"
- Tags: Male, Narrator
- Use: Account information, formal communications

Voice 3: "Customer Service - Calm Female"
- Tags: Female, Conversational
- Use: Complaint handling, de-escalation

Testing Custom Voices

Comprehensive Testing Process:

Preview Testing - Generate multiple TTS previews with varied scripts
Agent Integration - Create test agent with your voice
Script Testing - Test with actual conversation flows
Team Review - Get feedback from colleagues
A/B Testing - Compare with library voices
Live Pilot - Deploy to small percentage of calls first
Customer Feedback - Monitor customer reactions

Quality Checklist:

Pronunciation is clear and natural
Pace is appropriate for use case
Tone matches brand personality
No robotic or artificial qualities
Handles varied sentence types well
Emotional range is appropriate
Consistent quality across different texts
Regional pronunciation is accurate (if applicable)

Common Issues and Solutions

”Recording too short” Error

Problem: Recording is less than 3 seconds Solutions:

Record longer sample (5-7 seconds recommended)
Speak 2-3 complete sentences
Don’t rush through the recording
Include natural pauses

”Audio file too large” Error

Problem: File exceeds 32 MB limit Solutions:

Compress audio file using audio editor
Convert to MP3 format with reasonable bitrate
Trim unnecessary silence at beginning/end
Use online audio compression tools

”Preview generation failed” Error

Problem: TTS preview won’t generate Possible Causes:

Audio quality too low
Audio sample too short or too long
Excessive background noise
Temporary server processing issue

Solutions:

Upload different, higher-quality audio sample
Ensure recording environment is quiet
Check file format is supported
Verify file isn’t corrupted
Try again (may be temporary issue)

Voice Sounds Robotic or Unnatural

Problem: Generated voice lacks natural quality Common Causes:

Poor audio sample quality
Background noise in recording
Insufficient vocal variation
Overly monotone source audio
Very short sample duration

Solutions:

Re-record in quieter environment
Use better quality microphone
Include more natural speech variation
Speak with authentic inflection and emotion
Provide longer audio sample (if using upload)

Can’t Find Custom Voice

Problem: Created voice doesn’t appear in agent settings Solutions:

Check specifically in “My Voices” tab
Refresh browser page
Verify voice creation completed successfully
Confirm you’re in correct project
Check if voice was accidentally deleted

Use Cases and Examples

Brand Voice Consistency

Scenario: National retail chain Goal: Consistent voice across all locations Solution:

Clone authorized brand representative’s voice
Create custom voice with approved characteristics
Use across all agent instances
Ensure 100% brand consistency

Regional Market Targeting

Scenario: Middle East e-commerce Goal: Connect authentically with GCC customers Solution:

Clone native Gulf Arabic speaker
Select UAE or Saudi dialect
Ensure regional pronunciation patterns
Build trust through authentic accent

Multi-Department Strategy

Scenario: Large enterprise Goal: Different voices for different departments Solution:

Sales: Energetic, engaging voice
Support: Calm, helpful voice
Billing: Professional, clear voice
Executive: Authoritative, trustworthy voice

Legacy Voice Preservation

Scenario: Replacing voice actor Goal: Maintain consistency after personnel change Solution:

Clone original voice actor (with permission)
Create AI voice model
Transition seamlessly to AI
Preserve customer familiarity

Technical Specifications

File Specifications

Upload:

Max size: 32 MB
Formats: MP3, WAV, WebM, OGG, AAC, M4A, FLAC
Recommended duration: 30-90 seconds
Sample rate: 16kHz or higher recommended

Recording:

Duration: 3-9 seconds
Format: WAV (automatic)
Sample rate: Browser default
Bitrate: Automatic

Processing Specifications

Voice creation time: 30-60 seconds
Preview generation: 10-30 seconds
Storage: Permanent (until manually deleted)
Usage: Unlimited across all agents

Limitations

Per Account:

Unlimited custom voices
32 MB max file size per upload
3-9 seconds for direct recording
5 MB max cover image size

Advanced Features

Instant Voice Enhancement (Beta)

Premium feature for improved voice quality: Features:

Automatic background noise removal
Voice clarity enhancement
Consistency optimization
Better results with imperfect recordings

Instant Voice Enhancement is a premium beta feature. Contact sales for access.

Voice Versioning

Maintain multiple versions of the same voice: Use Case: Test improvements without losing original Process:

Create new voice with updated audio sample
Use naming convention (e.g., “Sales Voice v2”)
Test new version in parallel
Switch agents when ready
Keep old version as backup

Voice Library

Browse pre-built AI voices from our library

Single Prompt Voice Settings

Configure voice in Single Prompt Agents

Flow Agent Voice Settings

Set up voices in Flow Agents

Testing Your Agent

Test your custom voice in real call scenarios

Getting Started

Capabilities

Administration

​Overview

​Why Use Voice Cloning

​How Voice Cloning Works

​Processing Time

​Creating a Custom Voice

​Step 1: Voice Details

​Step 2: Input Audio

​Option A: Upload Audio File

​Option B: Record Voice

​Step 3: Generate Preview

​Step 4: Create Voice

​Audio Quality Guidelines

​Recording Environment

​Microphone Selection

​Audio Content

​Managing Custom Voices

​Viewing Custom Voices

​Using Custom Voices

​Editing Voice Details

​Deleting Custom Voices

​Voice Cloning Best Practices

​Sample Selection Strategy

​Multi-Voice Strategy

​Testing Custom Voices

​Common Issues and Solutions

​”Recording too short” Error

​”Audio file too large” Error

​”Preview generation failed” Error

​Voice Sounds Robotic or Unnatural

​Can’t Find Custom Voice

​Use Cases and Examples

​Brand Voice Consistency

​Regional Market Targeting

​Multi-Department Strategy

​Legacy Voice Preservation

​Technical Specifications

​File Specifications

​Processing Specifications

​Limitations

​Advanced Features

​Instant Voice Enhancement (Beta)

​Voice Versioning

​Related Documentation

Voice Library

Single Prompt Voice Settings

Flow Agent Voice Settings

Testing Your Agent

Overview

Why Use Voice Cloning

How Voice Cloning Works

Processing Time

Creating a Custom Voice

Step 1: Voice Details

Step 2: Input Audio

Option A: Upload Audio File

Option B: Record Voice

Step 3: Generate Preview

Step 4: Create Voice

Audio Quality Guidelines

Recording Environment

Microphone Selection

Audio Content

Managing Custom Voices

Viewing Custom Voices

Using Custom Voices

Editing Voice Details

Deleting Custom Voices

Voice Cloning Best Practices

Sample Selection Strategy

Multi-Voice Strategy

Testing Custom Voices

Common Issues and Solutions

”Recording too short” Error

”Audio file too large” Error

”Preview generation failed” Error

Voice Sounds Robotic or Unnatural

Can’t Find Custom Voice

Use Cases and Examples

Brand Voice Consistency

Regional Market Targeting

Multi-Department Strategy

Legacy Voice Preservation

Technical Specifications

File Specifications

Processing Specifications

Limitations

Advanced Features

Instant Voice Enhancement (Beta)

Voice Versioning

Related Documentation