Voice Cloning - Hamsa API

Overview

Voice Cloning allows you to create custom AI voices that match specific tones, accents, or brand identities. Upload audio samples or record your voice, and our system generates a unique AI voice you can use in your agents.

Voice Cloning Features:

Create unlimited custom voices
Upload audio files or record directly
Support for multiple languages and dialects
Preview voices before finalizing
Full integration with agent voice selection

How Voice Cloning Works

Provide Audio Sample - Upload or record voice audio
Configure Voice Details - Name, tags, language, dialect
Processing - System analyzes and creates AI voice model
Preview & Test - Generate TTS preview to hear result
Use in Agents - Select custom voice like any library voice

Voice cloning quality depends on audio sample quality. Clear, noise-free recordings produce the best results.

Creating a Custom Voice

Step 1: Voice Details

Basic Information

Name (Required)

Descriptive name for your voice
Example: “Customer Service - Sarah”, “Sales - Professional Male”
Max 100 characters
Helps identify voice in library

Description (Optional)

Additional context about the voice
Use case or characteristics
Not visible to end users

Language (Required) Choose the primary language:

English - For English-speaking markets
Arabic - For Arabic-speaking markets

Select the language that matches how the voice will be used. This affects pronunciation and natural speech patterns.

Dialect Selection

For Arabic Voices: Choose specific regional dialect:

Egyptian (EG)
Jordanian (JO)
Saudi Arabian (SA)
UAE (AE)
Gulf
Levantine
North African

For English Voices: Dialect not required - English voices adapt to general pronunciation.

Selecting the correct dialect ensures natural pronunciation of regional expressions and accents.

Voice Tags (Required)

Select exactly 2 tags - one from each category: Gender:

Male - Masculine voice
Female - Feminine voice

Style:

Conversational - Natural, friendly tone for dialogues
Narrator - Clear, articulate tone for announcements

Why tags matter:

Help organize your custom voices
Enable filtering in voice library
Communicate voice characteristics to team
Match voices to use cases

Cover Image (Optional)

Upload a visual representation:

Formats: JPG, JPEG, PNG
Max Size: 5 MB
Recommended: Professional headshot or brand logo
Usage: Displays in voice cards

Use consistent cover images across custom voices for easy brand recognition.

Step 2: Input Audio

Choose how to provide your voice sample:

Option A: Upload Audio File

Supported Formats:

MP3
WAV
WebM
OGG
AAC
M4A
FLAC

Requirements:

Max File Size: 32 MB
Quality: Clear, noise-free audio
Duration: Recommended 30-90 seconds
Content: Natural speech, varied sentences

Upload Process:

Click Upload tab
Drag and drop audio file or click to browse
Wait for upload (progress bar shown)
File validated automatically

Best Practices for Uploaded Audio:

Use high-quality recording equipment
Record in quiet environment
Avoid background noise and echo
Include varied speech patterns
Speak naturally at normal pace
Include different sentence types (questions, statements)

Option B: Record Voice

Record directly in the browser: Requirements:

Duration: 3-9 seconds
Format: WAV (automatic)
Microphone: Required

Recording Process:

Click Record tab
Grant microphone permission
Click Start Recording
Speak naturally for 3-9 seconds
Click Stop Recording
Review recording
Re-record if needed

Recording Tips:

Use good quality microphone
Minimize background noise
Speak at natural pace
Include varied inflection
Say 2-3 complete sentences
Don’t rush or speak too slowly

Recording Length:

Minimum: 3 seconds (validation error if shorter)
Maximum: 9 seconds (recording stops automatically)
Sweet spot: 5-7 seconds for best results

Step 3: Generate TTS Preview

Before finalizing, preview how your custom voice sounds: Preview Process:

Enter sample text (minimum 5 words)
Click Generate Preview
System processes voice sample
Audio preview generates (10-30 seconds)
Click play to listen
Regenerate with different text if needed

Preview Text Suggestions:

"Hello, thank you for calling. How can I help you today?"

"Welcome to Acme Corporation. I'm here to assist with any questions you may have."

"Your order has been confirmed and will ship within two business days."

Use text that matches your actual agent scripts to hear how the voice will sound in real conversations.

Preview Validation:

Text must contain at least 5 words
Word count displayed in real-time
”✓ Valid” indicator when requirements met

Preview Status:

Generating Preview… - Processing voice sample
Preview Ready - Audio ready to play
Preview Failed - Error occurred, try again

Step 4: Create Voice

Once satisfied with the preview:

Click Create button
Voice processes (usually 30-60 seconds)
Success message appears
Voice added to “My Voices” library
Available immediately in agent voice selection

Voice created successfully! You can now use it in your agents.

Audio Quality Requirements

Recording Environment

Ideal:

Quiet room with minimal echo
Sound-dampening materials (curtains, furniture)
Closed windows and doors
No HVAC or fan noise

Avoid:

Outdoor recordings
Rooms with hard surfaces (echo)
Areas with background conversations
Near computers or electronics (buzz/hum)

Microphone Selection

Good Options:

USB condenser microphone
Headset with noise cancellation
Dedicated podcasting microphone
Laptop built-in mic (in quiet environment)

Poor Options:

Phone speakerphone
Far-field microphones
Low-quality earbuds
Heavily compressed audio sources

Audio Sample Content

Include variety:

Questions (“How can I help you today?”)
Statements (“Your order has shipped”)
Different emotions (friendly, professional, reassuring)
Various sentence lengths
Natural pauses and inflection

Avoid:

Monotone speech
Reading lists or numbers only
Repetitive phrases
Shouting or whispering
Background music or effects

Managing Custom Voices

Viewing Custom Voices

Navigate to Voices in sidebar
Click My Voices tab
All custom voices display here
Same features as library voices (preview, favorite, etc.)

Using Custom Voices

Custom voices work identically to library voices: In Single Prompt Agents:

Open agent settings
Navigate to Voice Settings
Click Select Voice
Go to “My Voices” tab
Select your custom voice

In Flow Agents:

Available in global voice settings
Can be used in node-level voice overrides
Appears in all voice selectors

Deleting Custom Voices

Deleting a custom voice is permanent and cannot be undone.

Before deleting:

Remove voice from all agents using it
Export/save audio sample if you want to recreate later
Consider deactivating instead of deleting

Delete Process:

Find voice in “My Voices” tab
Click voice actions menu (⋮)
Select Delete Voice
Confirm deletion
Voice removed from library

What happens to agents:

Agents using deleted voice will show error
Must select new voice for affected agents
Previous calls with that voice remain in history

Voice Cloning Best Practices

Sample Selection

For customer service voices:

Friendly, helpful tone
Clear enunciation
Moderate pace
Warm inflection

For sales voices:

Confident, enthusiastic
Engaging energy
Natural variation
Professional but personable

For technical support:

Clear, methodical pace
Patient tone
Reassuring demeanor
Precise pronunciation

Multi-Voice Strategy

Create voice variations for different scenarios: Example: Customer Service Department

Voice 1: "Customer Service - Friendly Female"
- Tag: Female, Conversational
- Use: General inquiries, warm greeting

Voice 2: "Customer Service - Professional Male"
- Tag: Male, Narrator
- Use: Account information, formal communications

Voice 3: "Customer Service - Calm Female"
- Tag: Female, Conversational
- Use: Complaint handling, de-escalation

Language and Dialect Matching

For Arabic markets:

Egyptian: Broad Middle East appeal
Gulf (Saudi, UAE): GCC business markets
Levantine: Jordan, Syria, Lebanon regions
Use dialect matching target customer base

For English markets:

Clear, neutral accent for international
Regional accents for local businesses
Professional pronunciation for all markets

Testing Custom Voices

Before deploying:

Preview Testing - Generate multiple TTS previews with different scripts
Agent Testing - Use in test agent with actual conversation flow
Team Review - Have colleagues listen and provide feedback
A/B Testing - Compare with library voices
Live Testing - Deploy to small percentage of calls first

Quality Checklist:

Pronunciation is clear and natural
Pace is appropriate for use case
Tone matches brand personality
No robotic or artificial sound
Handles varied sentence types well
Emotional range is appropriate
Consistent quality across different texts

Common Issues

”Recording too short” error

Problem: Recording is less than 3 seconds Solution:

Record longer sample (5-7 seconds recommended)
Speak 2-3 complete sentences
Don’t rush through the recording

”Audio file too large” error

Problem: File exceeds 32 MB Solution:

Compress audio file
Use MP3 format with lower bitrate
Trim unnecessary silence
Use online audio compression tool

”Preview generation failed”

Problem: TTS preview won’t generate Possible causes:

Audio quality too low
Audio sample too short/long
Server processing issue

Solutions:

Try uploading different audio sample
Ensure clean, clear recording
Check file format is supported
Try again (temporary issue)

Voice sounds robotic or unnatural

Problem: Generated voice doesn’t sound natural Causes:

Low-quality audio sample
Background noise in recording
Insufficient audio variation
Overly monotone source

Solutions:

Re-record in quieter environment
Use better microphone
Include more natural speech variation
Speak with natural inflection

Can’t find custom voice in agent

Problem: Created voice doesn’t appear in voice selector Solutions:

Check “My Voices” tab specifically
Refresh browser page
Verify voice creation completed successfully
Check project selection is correct

Voice Cloning Limits

Per Account:

Unlimited custom voices
32 MB max file size per upload
3-9 seconds for direct recording

Processing Time:

Voice creation: 30-60 seconds
TTS preview generation: 10-30 seconds

Storage:

Custom voices stored permanently
Cover images: 5 MB max each

Advanced Features

Instant Voice (Beta)

Premium feature for voice isolation and enhancement: Features:

Removes background noise from samples
Enhances voice clarity
Improves consistency
Better quality with less-than-perfect recordings

Instant Voice is a premium beta feature. Contact sales for access.

Voice Versioning

Create multiple versions of same voice: Use case: Update voice without losing original

Create new voice with same base audio
Use different tags or names to distinguish
Test new version before switching agents
Keep old version as backup

Voice Library

Browse and select from pre-built AI voices

Single Prompt Voice Settings

Configure voice in Single Prompt Agents

Flow Agent Voice Settings

Set up voices in Flow Agents

Testing Agents

Test your custom voice in real calls

Getting Started

Voices

Guides

​Overview

​How Voice Cloning Works

​Creating a Custom Voice

​Step 1: Voice Details

​Basic Information

​Dialect Selection

​Voice Tags (Required)

​Cover Image (Optional)

​Step 2: Input Audio

​Option A: Upload Audio File

​Option B: Record Voice

​Step 3: Generate TTS Preview

​Step 4: Create Voice

​Audio Quality Requirements

​Recording Environment

​Microphone Selection

​Audio Sample Content

​Managing Custom Voices

​Viewing Custom Voices

​Using Custom Voices

​Deleting Custom Voices

​Voice Cloning Best Practices

​Sample Selection

​Multi-Voice Strategy

​Language and Dialect Matching

​Testing Custom Voices

​Common Issues

​”Recording too short” error

​”Audio file too large” error

​”Preview generation failed”

​Voice sounds robotic or unnatural

​Can’t find custom voice in agent

​Voice Cloning Limits

​Advanced Features

​Instant Voice (Beta)

​Voice Versioning

​Related Documentation

Voice Library

Single Prompt Voice Settings

Flow Agent Voice Settings

Testing Agents

Overview

How Voice Cloning Works

Creating a Custom Voice

Step 1: Voice Details

Basic Information

Dialect Selection

Voice Tags (Required)

Cover Image (Optional)

Step 2: Input Audio

Option A: Upload Audio File

Option B: Record Voice

Step 3: Generate TTS Preview

Step 4: Create Voice

Audio Quality Requirements

Recording Environment

Microphone Selection

Audio Sample Content

Managing Custom Voices

Viewing Custom Voices

Using Custom Voices

Deleting Custom Voices

Voice Cloning Best Practices

Sample Selection

Multi-Voice Strategy

Language and Dialect Matching

Testing Custom Voices

Common Issues

”Recording too short” error

”Audio file too large” error

”Preview generation failed”

Voice sounds robotic or unnatural

Can’t find custom voice in agent

Voice Cloning Limits

Advanced Features

Instant Voice (Beta)

Voice Versioning

Related Documentation