Skip to main content
Hamsa Speech to Text (STT) accurately transcribes Arabic speech across multiple dialects into text with word-level timestamps and speaker identification.

What you can do

  • Transcribe Arabic media content, podcasts, and videos
  • Generate subtitles for Arabic video content
  • Create searchable text from Arabic audio recordings
  • Enable real-time transcription for voice agents and live calls
  • Document Arabic meetings and interviews

Models

ModelBest ForLatency
STT StandardBatch transcription, high accuracyOptimized for quality
STT RealtimeLive calls, voice agents, streaming~150-250ms

View all models

Compare models and see detailed specifications

Key features

  • Dialect recognition: Automatic detection and transcription of Arabic dialects
  • Word-level timestamps: Precise timing for each transcribed word
  • Speaker diarization: Identify different speakers in multi-speaker audio
  • Code-switching: Handle mixed Arabic-English speech naturally

Supported languages

  • Arabic dialects: Egyptian, Gulf, Levantine, North African, Iraqi, Yemeni, Modern Standard Arabic
  • English: US English

Get started