Skip to main content
Learn about the models that power the Hamsa API.

Flagship models

Text to Speech

Speech to Text

Models overview

The Hamsa API offers a range of audio models optimized for Arabic language processing, with support for multiple dialects and English.
Model IDDescriptionLanguages
hamsa-tts-standardHigh-quality Arabic and English speech synthesisArabic dialects (Egyptian, Gulf, Levantine, North African), English (US)
hamsa-tts-realtimeUltra-fast TTS optimized for real-time applicationsArabic dialects, English (US)
hamsa-stt-standardHigh-accuracy Arabic speech recognitionArabic dialects (Egyptian, Gulf, Levantine, North African), English (US)
hamsa-stt-realtimeReal-time speech recognition for live conversationsArabic dialects, English (US)

Hamsa TTS Standard

Hamsa TTS Standard is our high-quality speech synthesis model optimized for Arabic dialects and English. It produces natural, lifelike speech with proper pronunciation of Arabic text across multiple dialects. This model works well in the following scenarios:
  • Content Creation: Perfect for generating Arabic audio content, podcasts, and videos
  • Accessibility: Generate audio versions of written Arabic content
  • E-Learning: Create educational content in Arabic with natural pronunciation
  • Media Production: Professional-quality voiceovers for Arabic media

Supported languages

The Hamsa TTS Standard model supports: Arabic Dialects:
  • Egyptian Arabic (arz)
  • Gulf Arabic (afb) - Saudi, UAE, Kuwait, etc.
  • Levantine Arabic (apc) - Syrian, Lebanese, Jordanian, Palestinian
  • North African Arabic (arq/ary) - Moroccan, Algerian, Tunisian, Libyan
  • Modern Standard Arabic (arb)
Other Languages:
  • English (US) (eng)

Hamsa TTS Realtime

Hamsa TTS Realtime is our fastest speech synthesis model, designed for real-time applications and Voice Agents Platform. It delivers high-quality Arabic speech with ultra-low latency. This model is particularly well-suited for:
  • Voice Agents Platform: Perfect for real-time voice agents and phone calls
  • Interactive Applications: Ideal for chatbots requiring immediate voice response
  • Live Conversations: Real-time TTS for conversational AI applications
With its lower latency (~150ms-200ms) and optimized processing, TTS Realtime is the ideal choice for anyone needing fast, reliable speech synthesis for conversational applications.

Hamsa STT Standard

Hamsa STT Standard is our high-accuracy speech recognition model designed for accurate transcription of Arabic speech across multiple dialects. It provides precise word-level timestamps and speaker diarization. This model excels in scenarios requiring accurate speech-to-text conversion:
  • Transcription Services: Perfect for converting Arabic audio/video content to text
  • Meeting Documentation: Ideal for capturing and documenting Arabic conversations
  • Content Analysis: Well-suited for Arabic audio content processing and analysis
  • Media Subtitling: Generate accurate subtitles for Arabic media content
Key features:
  • Accurate transcription with word-level timestamps
  • Speaker diarization for multi-speaker audio
  • Support for multiple Arabic dialects
  • Punctuation and formatting

Hamsa STT Realtime

Hamsa STT Realtime is our real-time speech recognition model, delivering accurate transcription of Arabic speech with ultra-low latency for live conversations. This model excels in conversational use cases:
  • Live call transcription: Perfect for real-time Arabic call transcription
  • AI Voice Agents: Ideal for live conversations in Arabic
  • Live Meetings: Real-time transcription for Arabic meetings and conferences
Key features:
  • Ultra-low latency: Get transcriptions in real-time
  • Streaming support: Send audio in chunks while receiving transcripts
  • Multiple audio formats: Support for various audio encodings
  • Dialect recognition: Automatic recognition of Arabic dialects

Model selection guide

Quality

Use hamsa-tts-standard or hamsa-stt-standardBest for high-quality audio output and accurate transcription

Low-latency

Use hamsa-tts-realtime or hamsa-stt-realtimeOptimized for real-time applications

Arabic Dialects

All models support multiple Arabic dialectsChoose based on latency vs quality requirements

Balanced

Use Standard models for best balanceGood balance between quality and performance

Content creation

Use hamsa-tts-standardIdeal for professional Arabic content, media & video narration.

Voice Agents Platform

Use hamsa-tts-realtime and hamsa-stt-realtimePerfect for real-time conversational applications in Arabic

Transcription

Use hamsa-stt-standardBest accuracy for batch Arabic transcription

Character limits

The maximum number of characters supported in a single text-to-speech request varies by model.
Model IDCharacter limitApproximate audio duration
hamsa-tts-standard10,000~10 minutes
hamsa-tts-realtime5,000~5 minutes
For longer content, consider splitting the input into multiple requests.

Audio duration limits

The maximum audio duration supported for speech-to-text varies by model.
Model IDAudio duration limitFile size limit
hamsa-stt-standard60 minutes500 MB
hamsa-stt-realtimeStreaming (unlimited)N/A

Plans and Usage Limits

Your subscription plan determines your monthly usage limits and concurrent call capacity.

Plan Comparison

PlanPriceCreditsVoice AgentSpeech to TextText to SpeechConcurrencyKB Storage
Free$0/mo509 min50 min25 min11 MB
Starter$5/mo10017 min100 min50 min15 MB
Creator$15/mo50084 min500 min250 min210 MB
Pro$100/mo5,000834 min5,000 min2,500 min550 MB
Business$320/mo20,0003,334 min20,000 min10,000 min10100 MB
EnterpriseCustomCustomUnlimitedUnlimitedUnlimitedUnlimited300 MB

Plan Features

FeatureFreeStarterCreatorProBusinessEnterprise
Access to All Models
Fine-tuned AI Models----
Basic Cloud Support-----
Full Cloud Support----
On-Premise Solution-----
To increase your usage limits & concurrent calls, upgrade your subscription plan.Enterprise customers can request custom limits by contacting sales.

API requests per minute vs concurrent requests

It’s important to understand that API requests per minute and concurrent requests are different metrics that depend on your usage patterns. API requests per minute can be different from concurrent requests since it depends on the length of time for each request and how the requests are batched. Example 1: Spaced requests If you had 60 requests per minute that each took 1 second to complete and you sent them each 1 second apart, the max concurrent requests would be 1 and the average would be 1. Example 2: Batched requests However, if you had 60 requests per minute that each took 3 seconds to complete but all fired at once, the max concurrent requests would be 60 and the average would be 3. Since our system cares about concurrency, requests per minute matter less than how long each of the requests take and the pattern of when they are sent.