Skip to main content
Hamsa Text to Speech (TTS) converts written text into natural-sounding audio with proper Arabic pronunciation, intonation, and support for multiple dialects. Whether you’re creating media content, building voice applications, or making content accessible, Hamsa TTS delivers high-quality Arabic speech synthesis.

Overview

Key features

Arabic dialect support

Our TTS models are specifically optimized for Arabic speech:
  • Multiple dialects: Egyptian, Gulf, Levantine, North African, and Modern Standard Arabic
  • Natural pronunciation: Proper handling of Arabic phonetics and pronunciation rules
  • Code-switching: Seamless handling of mixed Arabic-English text
  • Diacritical marks: Support for tashkeel and proper pronunciation

High-quality voices

  • Pre-built Arabic voices optimized for different dialects
  • Custom voice cloning for brand consistency
  • Voice customization options
  • Gender and age variety

Flexible integration

  • REST API for programmatic access
  • Web interface via Media Platform
  • Real-time streaming support
  • Multiple audio format options (MP3, WAV, PCM)

Use cases

Media & content creation

Generate high-quality Arabic voiceovers for:
  • Video content and advertisements
  • Podcasts and audio content
  • Social media content
  • Marketing materials

Accessibility

Make Arabic content accessible:
  • Audio versions of written content
  • Screen readers for Arabic websites
  • Educational materials
  • Documentation narration

Voice agents & IVR

Power conversational applications:
  • AI voice agents for customer service
  • IVR systems with natural Arabic speech
  • Automated phone systems
  • Voice assistants

E-learning

Create engaging educational content:
  • Course narration in Arabic
  • Language learning applications
  • Training materials
  • Interactive lessons

Models

Hamsa offers two TTS models optimized for different use cases:

TTS Standard

Best for qualityHigh-quality Arabic speech synthesis with natural pronunciation and intonation.
  • 10,000 character limit
  • ~300-500ms latency
  • Multiple Arabic dialects
  • Highest quality output

TTS Realtime

Best for speedUltra-fast model optimized for real-time voice agents and interactive applications.
  • 5,000 character limit
  • ~150-200ms latency
  • Arabic dialects + English
  • Conversational AI optimized

Supported languages

Arabic Dialects:
  • Egyptian Arabic (arz)
  • Gulf Arabic (afb) - Saudi, UAE, Kuwait, Bahrain, Qatar, Oman
  • Levantine Arabic (apc) - Syrian, Lebanese, Jordanian, Palestinian
  • North African Arabic (arq/ary) - Moroccan, Algerian, Tunisian, Libyan
  • Modern Standard Arabic (arb)
Other Languages:
  • English (US) (eng)

Audio formats

Hamsa TTS supports multiple output formats:
  • MP3: Standard compressed audio (22.05kHz - 44.1kHz)
  • WAV: Uncompressed audio (16kHz - 48kHz)
  • PCM: Raw audio data (16-bit, 16kHz - 48kHz)
  • μ-law: Telephony optimized (8kHz)

Getting started

1

Choose your integration

Use the Media Platform for web interface or API for programmatic access
2

Select a voice

Choose a voice that matches your target dialect and use case
3

Select a model

Use TTS Standard for quality or TTS Realtime for low-latency applications
4

Prepare your text

Format your Arabic text with proper encoding (UTF-8) and punctuation
5

Generate audio

Call the API or use the web interface to generate speech

Next steps

FAQ

Use TTS Standard for high-quality media content, voiceovers, and when audio quality is paramount. Use TTS Realtime for voice agents, interactive applications, and when low latency is critical.
Yes, our models naturally handle code-switching between Arabic and English. The model automatically detects and pronounces each language correctly.
Choose the dialect that matches your target audience. Egyptian Arabic has wide recognition across the Arab world. Gulf Arabic is preferred in GCC countries. Levantine is common in the Levant region. For formal content, use Modern Standard Arabic (MSA).
Yes, Hamsa supports custom voice cloning. See our voice cloning guide for more information.
TTS Standard supports up to 10,000 characters per request. TTS Realtime supports up to 5,000 characters. For longer content, split into multiple requests.

Resources