Models

Model ID	Description	Languages
`hamsa-tts-standard`	High-quality Arabic and English speech synthesis	Arabic dialects (Egyptian, Gulf, Levantine, North African), English (US)
`hamsa-tts-realtime`	Ultra-fast TTS optimized for real-time applications	Arabic dialects, English (US)
`hamsa-stt-standard`	High-accuracy Arabic speech recognition	Arabic dialects (Egyptian, Gulf, Levantine, North African), English (US)
`hamsa-stt-realtime`	Real-time speech recognition for live conversations	Arabic dialects, English (US)

Hamsa TTS Standard

Hamsa TTS Standard is our high-quality speech synthesis model optimized for Arabic dialects and English. It produces natural, lifelike speech with proper pronunciation of Arabic text across multiple dialects. This model works well in the following scenarios:

Content Creation: Perfect for generating Arabic audio content, podcasts, and videos
Accessibility: Generate audio versions of written Arabic content
E-Learning: Create educational content in Arabic with natural pronunciation
Media Production: Professional-quality voiceovers for Arabic media

Supported languages

The Hamsa TTS Standard model supports: Arabic Dialects:

Egyptian Arabic (arz)
Gulf Arabic (afb) - Saudi, UAE, Kuwait, etc.
Levantine Arabic (apc) - Syrian, Lebanese, Jordanian, Palestinian
North African Arabic (arq/ary) - Moroccan, Algerian, Tunisian, Libyan
Modern Standard Arabic (arb)

Other Languages:

English (US) (eng)

Hamsa TTS Realtime

Hamsa TTS Realtime is our fastest speech synthesis model, designed for real-time applications and Voice Agents Platform. It delivers high-quality Arabic speech with ultra-low latency. This model is particularly well-suited for:

Voice Agents Platform: Perfect for real-time voice agents and phone calls
Interactive Applications: Ideal for chatbots requiring immediate voice response
Live Conversations: Real-time TTS for conversational AI applications

With its lower latency (~150ms-200ms) and optimized processing, TTS Realtime is the ideal choice for anyone needing fast, reliable speech synthesis for conversational applications.

Hamsa STT Standard

Hamsa STT Standard is our high-accuracy speech recognition model designed for accurate transcription of Arabic speech across multiple dialects. It provides precise word-level timestamps and speaker diarization. This model excels in scenarios requiring accurate speech-to-text conversion:

Transcription Services: Perfect for converting Arabic audio/video content to text
Meeting Documentation: Ideal for capturing and documenting Arabic conversations
Content Analysis: Well-suited for Arabic audio content processing and analysis
Media Subtitling: Generate accurate subtitles for Arabic media content

Key features:

Accurate transcription with word-level timestamps
Speaker diarization for multi-speaker audio
Support for multiple Arabic dialects
Punctuation and formatting

Hamsa STT Realtime

Hamsa STT Realtime is our real-time speech recognition model, delivering accurate transcription of Arabic speech with ultra-low latency for live conversations. This model excels in conversational use cases:

Live call transcription: Perfect for real-time Arabic call transcription
AI Voice Agents: Ideal for live conversations in Arabic
Live Meetings: Real-time transcription for Arabic meetings and conferences

Key features:

Ultra-low latency: Get transcriptions in real-time
Streaming support: Send audio in chunks while receiving transcripts
Multiple audio formats: Support for various audio encodings
Dialect recognition: Automatic recognition of Arabic dialects

Model selection guide

Requirements

Quality

Use hamsa-tts-standard or hamsa-stt-standardBest for high-quality audio output and accurate transcription

Low-latency

Use hamsa-tts-realtime or hamsa-stt-realtimeOptimized for real-time applications

Arabic Dialects

All models support multiple Arabic dialectsChoose based on latency vs quality requirements

Balanced

Use Standard models for best balanceGood balance between quality and performance

Use case

Content creation

Use hamsa-tts-standardIdeal for professional Arabic content, media & video narration.

Voice Agents Platform

Use hamsa-tts-realtime and hamsa-stt-realtimePerfect for real-time conversational applications in Arabic

Transcription

Use hamsa-stt-standardBest accuracy for batch Arabic transcription

Character limits

The maximum number of characters supported in a single text-to-speech request varies by model.

Model ID	Character limit	Approximate audio duration
`hamsa-tts-standard`	10,000	~10 minutes
`hamsa-tts-realtime`	5,000	~5 minutes

For longer content, consider splitting the input into multiple requests.

Audio duration limits

The maximum audio duration supported for speech-to-text varies by model.

Model ID	Audio duration limit	File size limit
`hamsa-stt-standard`	60 minutes	500 MB
`hamsa-stt-realtime`	Streaming (unlimited)	N/A

Plans and Usage Limits

Your subscription plan determines your monthly usage limits and concurrent call capacity.

Plan Comparison

Plan	Price	Credits	Voice Agent	Speech to Text	Text to Speech	Concurrency	KB Storage
Free	$0/mo	50	9 min	50 min	25 min	1	1 MB
Starter	$5/mo	100	17 min	100 min	50 min	1	5 MB
Creator	$15/mo	500	84 min	500 min	250 min	2	10 MB
Pro	$100/mo	5,000	834 min	5,000 min	2,500 min	5	50 MB
Business	$320/mo	20,000	3,334 min	20,000 min	10,000 min	10	100 MB
Enterprise	Custom	Custom	Unlimited	Unlimited	Unlimited	Unlimited	300 MB

Plan Features

Feature	Free	Starter	Creator	Pro	Business	Enterprise
Access to All Models	✓	✓	✓	✓	✓	✓
Fine-tuned AI Models	-	-	-	-	✓	✓
Basic Cloud Support	-	-	-	✓	-	-
Full Cloud Support	-	-	-	-	✓	✓
On-Premise Solution	-	-	-	-	-	✓

To increase your usage limits & concurrent calls, upgrade your subscription plan.Enterprise customers can request custom limits by contacting sales.

API requests per minute vs concurrent requests

It’s important to understand that API requests per minute and concurrent requests are different metrics that depend on your usage patterns. API requests per minute can be different from concurrent requests since it depends on the length of time for each request and how the requests are batched. Example 1: Spaced requests If you had 60 requests per minute that each took 1 second to complete and you sent them each 1 second apart, the max concurrent requests would be 1 and the average would be 1. Example 2: Batched requests However, if you had 60 requests per minute that each took 3 seconds to complete but all fired at once, the max concurrent requests would be 60 and the average would be 3. Since our system cares about concurrency, requests per minute matter less than how long each of the requests take and the pattern of when they are sent.

Getting Started

Capabilities

Administration

Flagship models

Text to Speech

Hamsa TTS Standard

Hamsa TTS Realtime

Speech to Text

Hamsa STT Standard

Hamsa STT Realtime

Models overview

Hamsa TTS Standard

Supported languages

Hamsa TTS Realtime

Hamsa STT Standard

Hamsa STT Realtime

Model selection guide

Quality

Low-latency

Arabic Dialects

Balanced

Content creation

Voice Agents Platform

Transcription

Character limits

Audio duration limits

Plans and Usage Limits

Plan Comparison

Plan Features

API requests per minute vs concurrent requests

Getting Started

Capabilities

Administration

​Flagship models

​Text to Speech

Hamsa TTS Standard

Hamsa TTS Realtime

​Speech to Text

Hamsa STT Standard

Hamsa STT Realtime

​Models overview

​Hamsa TTS Standard

​Supported languages

​Hamsa TTS Realtime

​Hamsa STT Standard

​Hamsa STT Realtime

​Model selection guide

Quality

Low-latency

Arabic Dialects

Balanced

Content creation

Voice Agents Platform

Transcription

​Character limits

​Audio duration limits

​Plans and Usage Limits

​Plan Comparison

​Plan Features

​API requests per minute vs concurrent requests

Flagship models

Text to Speech

Speech to Text

Models overview

Hamsa TTS Standard

Supported languages

Hamsa TTS Realtime

Hamsa STT Standard

Hamsa STT Realtime

Model selection guide

Character limits

Audio duration limits

Plans and Usage Limits

Plan Comparison

Plan Features

API requests per minute vs concurrent requests