Skip to main content
Hamsa Speech to Text (STT) accurately transcribes Arabic speech across multiple dialects into text with word-level timestamps and speaker identification.

What you can do

  • Transcribe Arabic media content, podcasts, and videos
  • Generate subtitles for Arabic video content
  • Create searchable text from Arabic audio recordings
  • Enable real-time transcription for voice agents and live calls
  • Document Arabic meetings and interviews

Models

ModelBest ForLatency
STT StandardBatch transcription, high accuracyOptimized for quality
STT RealtimeLive calls, voice agents, streaming~150-250ms

View all models

Compare models and see detailed specifications

Key features

  • Dialect recognition: Automatic detection and transcription of Arabic dialects
  • Word-level timestamps: Precise timing for each transcribed word
  • Speaker diarization: Identify different speakers in multi-speaker audio
  • Code-switching: Handle mixed Arabic-English speech naturally

Supported languages

  • Arabic dialects: Egyptian, Gulf, Levantine, North African, Iraqi, Yemeni, Modern Standard Arabic
  • English: US English

Get started

STT Documentation

Complete guide to Speech to Text features and integration

Quickstart

Get started with STT in minutes

Media Platform

Use STT through the web interface

API Reference

Technical API documentation