Learn about the models that power the Hamsa API.
Flagship models
Text to Speech
Jobs API
Async TTS via
/v1/jobs/text-to-speechNatural-sounding output optimized for Arabic dialects
Multiple Arabic dialects + English
Async job-based — result delivered via webhook
Realtime API
Sync TTS via
/v1/realtime/ttsLow latency — returns WAV audio directly
Arabic dialects + English
Optimized for conversational AI and voice agents
Speech to Text
Batch API
Async STT via
/v1/jobs/transcribeHigh accuracy transcription for Arabic dialects
Word-level timestamps
Speaker diarization support
Async job-based — result delivered via webhook
Realtime API
Sync STT via
/v1/realtime/sttArabic dialects + English
Base64-encoded audio input
Returns transcription directly
End-of-speech detection
Models overview
The Hamsa API offers audio processing optimized for Arabic language, with support for multiple dialects and English.| Endpoint | Description | Languages |
|---|---|---|
/v1/jobs/text-to-speech | Async TTS — job-based with webhook delivery | Arabic dialects, English |
/v1/realtime/tts | Sync TTS — returns WAV audio directly | Arabic dialects, English |
/v1/jobs/transcribe | Async STT — job-based with webhook delivery | Arabic, English |
/v1/realtime/stt | Sync STT — returns transcription directly | Arabic, English |
Hamsa TTS — Jobs API
The Jobs API (/v1/jobs/text-to-speech) is an async TTS endpoint. It creates a job and delivers the audio result via webhook. Best for batch processing and media content generation.
Use cases:
- Content Creation: Generate Arabic audio content, podcasts, and videos
- Accessibility: Audio versions of written Arabic content
- E-Learning: Educational content in Arabic with natural pronunciation
- Media Production: Professional-quality voiceovers
text, voiceId, webhookUrl, webhookAuth
→ See the TTS Quickstart for examples.
Hamsa TTS — Realtime API
The Realtime API (/v1/realtime/tts) returns WAV audio directly in the response. Designed for real-time applications and voice agents.
Use cases:
- Voice Agents: Real-time voice agents and phone calls
- Interactive Applications: Chatbots requiring immediate voice response
- Live Conversations: Conversational AI applications
text, speaker, dialect, mulaw
Supported dialects
| Code | Dialect | Example voices |
|---|---|---|
pls | Palestinian | Amjad, Layan |
egy | Egyptian | Mariam, Samir |
syr | Syrian | Dalal, Mais |
irq | Iraqi | Lyali, Fatma |
jor | Jordanian | Lana, Jasem |
leb | Lebanese | Carla, Majd |
ksa | Saudi | Hiba, Fahd |
uae | Emirati | Salma, Dima |
bah | Bahraini | Mazen, Ruba |
qat | Qatari | Deema, Faisal |
kuw | Kuwaiti | Mai, Hatem |
oma | Omani | Aisha, Jaber |
msa | Modern Standard Arabic | Salem, Tamim |
ar-sa | Arabic – Gulf | Khalid, Rahma |
en | English | Emma, James |
Hamsa STT — Batch API
The Batch API (/v1/jobs/transcribe) is an async STT endpoint. Submit a media URL and receive the transcription via webhook or polling. Choose from two models:
| Model ID | Best for |
|---|---|
Hamsa-General-V2.0 | General-purpose — media, podcasts, pre-recorded content |
Hamsa-Conversational-V1.0 | Conversational audio — meetings, calls, dialogues |
- Transcription Services: Convert Arabic audio/video content to text
- Meeting Documentation: Capture and document Arabic conversations with speaker identification
- Media Subtitling: Generate SRT subtitles for Arabic media content
- Content Analysis: Process and index Arabic audio content
- Word-level timestamps for each transcribed segment
- Speaker diarization for multi-speaker audio
- Automatic Arabic dialect detection (set
languagetoar) - SRT subtitle export with configurable formatting
- Automatic punctuation and formatting
mediaUrl, model, language, webhookUrl, returnSrtFormat, srtOptions
→ See the STT Quickstart for examples.
Hamsa STT — Realtime API
The Realtime API (/v1/realtime/stt) accepts base64-encoded audio and returns the transcription directly. For streaming, use the WebSocket API.
Use cases:
- Voice Agents: Real-time speech recognition for conversational AI
- Live call transcription: Transcribe Arabic calls in real time
- Interactive applications: Immediate transcription for chatbots and voice interfaces
- Synchronous — returns transcription in the response
- End-of-speech detection with configurable threshold
- Arabic and English language support
audioBase64, language, isEosEnabled, eosThreshold
→ See the STT Quickstart for examples.
Model selection guide
Requirements
Requirements
Batch / media content
Use the Jobs API (
/v1/jobs/text-to-speech) for async processing with webhook delivery.Real-time / voice agents
Use the Realtime API (
/v1/realtime/tts) or WebSocket for low-latency streaming.Arabic Dialects
Both TTS endpoints support 15 Arabic dialects + English. Choose based on latency requirements.
Use case
Use case
Content creation
Use the Jobs API for professional Arabic content, media, and video narration.
Voice Agents
Use the Realtime API / WebSocket for real-time conversational applications.
Transcription
Use the Batch API (
/v1/jobs/transcribe) with Hamsa-General-V2.0 for media transcription or Hamsa-Conversational-V1.0 for conversational audio.Character limits
| Endpoint | Character limit |
|---|---|
| WebSocket TTS | 2,000 characters per message |
For longer content, consider splitting the input into multiple requests.
Audio duration limits
| Endpoint | Audio duration limit | File size limit |
|---|---|---|
Batch API (/v1/jobs/transcribe) | 60 minutes | 500 MB |
Realtime API (/v1/realtime/stt) | Per-request | N/A |
WebSocket (/v1/realtime/ws) | Streaming | N/A |
Plans and Usage Limits
Your subscription plan determines your monthly usage limits and concurrent call capacity.Plan Comparison
| Plan | Price | Credits | Voice Agent | Speech to Text | Text to Speech | Concurrency | KB Storage |
|---|---|---|---|---|---|---|---|
| Free | $0/mo | 50 | 9 min | 50 min | 25 min | 1 | 1 MB |
| Starter | $5/mo | 100 | 17 min | 100 min | 50 min | 1 | 5 MB |
| Creator | $15/mo | 500 | 84 min | 500 min | 250 min | 2 | 10 MB |
| Pro | $100/mo | 5,000 | 834 min | 5,000 min | 2,500 min | 5 | 50 MB |
| Business | $320/mo | 20,000 | 3,334 min | 20,000 min | 10,000 min | 10 | 100 MB |
| Enterprise | Custom | Custom | Unlimited | Unlimited | Unlimited | Unlimited | 300 MB |
Plan Features
| Feature | Free | Starter | Creator | Pro | Business | Enterprise |
|---|---|---|---|---|---|---|
| Access to All Models | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| Fine-tuned AI Models | - | - | - | - | ✓ | ✓ |
| Basic Cloud Support | - | - | - | ✓ | - | - |
| Full Cloud Support | - | - | - | - | ✓ | ✓ |
| On-Premise Solution | - | - | - | - | - | ✓ |
To increase your usage limits & concurrent calls, upgrade your subscription plan.Enterprise customers can request custom limits by contacting sales.