Overview
Capability Overview
Learn about STT capabilities, models, and use cases
Models
Explore STT models and their specifications
API Reference
Technical API documentation for developers
Quickstart
Get started with STT in minutes
Key features
Arabic dialect recognition
Our STT models are specifically optimized for Arabic speech:- Automatic dialect detection: No need to specify the dialect - the model detects it automatically
- Multiple dialects: Egyptian, Gulf, Levantine, North African, Iraqi, Yemeni, and Modern Standard Arabic
- Code-switching: Natural handling of mixed Arabic-English speech
- Colloquial expressions: Recognition of dialect-specific idioms and expressions
Advanced transcription features
- Word-level timestamps: Precise timing for each transcribed word
- Speaker diarization: Identification of different speakers in multi-speaker audio
- Automatic punctuation: Natural punctuation and formatting
- High accuracy: Optimized models for Arabic speech patterns
Flexible integration
- REST API for batch transcription
- WebSocket API for real-time streaming
- Web interface via Media Platform
- Multiple audio format support (MP3, WAV, M4A, FLAC, OGG)
Use cases
Media transcription
Transcribe Arabic podcasts, videos, and media content:- Generate subtitles for videos
- Create searchable transcripts
- Content analysis and indexing
- Accessibility features
Voice agents
Power real-time conversational AI:- Customer service voice agents
- Live call transcription
- Real-time language understanding
- Conversation analytics
Meeting documentation
Document Arabic meetings and interviews:- Automatic meeting minutes
- Speaker identification
- Searchable archives
- Compliance and record-keeping
Content accessibility
Make Arabic audio content accessible:- Closed captions for videos
- Transcripts for audio content
- Search and discovery features
- Translation preparation
Models
Hamsa offers two STT models optimized for different use cases:STT Standard
Best for accuracyHigh-accuracy Arabic speech recognition with speaker diarization and detailed timestamps.
- Up to 60 minutes per file
- Word-level timestamps
- Speaker diarization
- Highest accuracy
- Batch processing
STT Realtime
Best for speedUltra-fast streaming transcription for real-time voice agents and live conversations.
- Real-time streaming
- ~150-250ms latency
- Word-level timestamps
- Continuous transcription
- WebSocket support
Supported languages
Arabic Dialects:- Egyptian Arabic (arz)
- Gulf Arabic (afb) - Saudi, UAE, Kuwait, Bahrain, Qatar, Oman
- Levantine Arabic (apc) - Syrian, Lebanese, Jordanian, Palestinian
- North African Arabic (arq/ary) - Moroccan, Algerian, Tunisian, Libyan
- Iraqi Arabic (acm)
- Yemeni Arabic (ayn)
- Modern Standard Arabic (arb)
- English (US) (eng)
Audio formats
Hamsa STT supports multiple input formats:- MP3: Standard compressed audio
- WAV: Uncompressed audio (recommended)
- M4A: MPEG-4 audio files
- FLAC: Lossless compression
- OGG: Ogg Vorbis audio
- PCM: Raw audio data (16-bit)
- Sample rate: 16kHz or higher
- Bit depth: 16-bit minimum
- Channels: Mono (recommended for best results)
Getting started
Next steps
Quickstart Guide
Build your first STT integration
Real-time STT
Learn about streaming transcription
Improving Accuracy
Tips for better transcription accuracy
Supported Languages
Detailed language and dialect information
FAQ
Which model should I use?
Which model should I use?
Use STT Standard for batch transcription of pre-recorded audio when accuracy is paramount. Use STT Realtime for live transcription in voice agents, real-time calls, or streaming audio applications.
Do I need to specify the Arabic dialect?
Do I need to specify the Arabic dialect?
No, our models automatically detect and transcribe the specific Arabic dialect. The system identifies whether it’s Egyptian, Gulf, Levantine, or another dialect and transcribes accordingly.
Can the model handle Arabic-English code-switching?
Can the model handle Arabic-English code-switching?
Yes, our models naturally handle speech that switches between Arabic and English, which is common in many Arabic-speaking regions. The transcription accurately captures both languages.
How accurate is speaker diarization?
How accurate is speaker diarization?
Speaker diarization works well for up to 4-5 distinct speakers with clear audio. Use STT Standard for best diarization results. Accuracy is highest with good audio quality and minimal speaker overlap.
What audio duration limits exist?
What audio duration limits exist?
STT Standard supports audio files up to 60 minutes. For longer audio, split into segments. STT Realtime supports continuous streaming with no duration limit.
How can I improve transcription accuracy?
How can I improve transcription accuracy?
Use high-quality audio (16kHz+), minimize background noise, ensure clear speech, and avoid speaker overlap. See our improving accuracy guide for detailed tips.