Speech to Text - Hamsa API

Hamsa Speech to Text (STT) accurately transcribes Arabic speech across multiple dialects into text with word-level timestamps and speaker identification.

What you can do

Transcribe Arabic media content, podcasts, and videos
Generate subtitles for Arabic video content
Create searchable text from Arabic audio recordings
Enable real-time transcription for voice agents and live calls
Document Arabic meetings and interviews

Models

Model	Best For	Latency
STT Standard	Batch transcription, high accuracy	Optimized for quality
STT Realtime	Live calls, voice agents, streaming	~150-250ms

View all models

Compare models and see detailed specifications

Key features

Dialect recognition: Automatic detection and transcription of Arabic dialects
Word-level timestamps: Precise timing for each transcribed word; the transcription API returns word-level data (word text plus start/end times) in each transcript segment so you can build word highlight or karaoke-style experiences
Word highlight during playback: In the Media Platform, the word currently being spoken is highlighted in sync with the audio or video; you can also click a word to jump to that point in the media
Speaker diarization: Identify different speakers in multi-speaker audio
Code-switching: Handle mixed Arabic-English speech naturally

Word highlight during playback

The transcription API returns word-level data in each transcript segment: each word includes its text plus start and end timestamps (in seconds). This enables two things:

In the Media Platform: When you open a transcription and play the audio or video, the word currently being spoken is highlighted in sync with playback. You can also click any word in the transcript to seek the media to that position.
Via the API: Your application receives the same word-level timestamps in the transcript/segment response, so you can build karaoke-style highlighting, click-to-seek, or other experiences that follow the speech.

Supported languages

Arabic dialects: Egyptian, Gulf, Levantine, North African, Iraqi, Yemeni, Modern Standard Arabic
English: US English

Get started

STT Documentation

Complete guide to Speech to Text features and integration

Quickstart

Get started with STT in minutes

Media Platform

Use STT through the web interface

API Reference

Technical API documentation

Text to Speech Voice Agents

⌘I

​What you can do

​Models