> ## Documentation Index > Fetch the complete documentation index at: https://docs.tryhamsa.com/llms.txt > Use this file to discover all available pages before exploring further. # Speech to Text > Transcribe Arabic and English speech into accurate text with Hamsa STT Hamsa Speech to Text (STT) transcribes Arabic speech across multiple dialects into text with word-level timestamps and speaker identification. Whether you're transcribing media content, building voice applications, or documenting conversations, Hamsa STT delivers high-accuracy Arabic speech recognition. ## Overview Technical API documentation for developers Get started with STT in minutes ## Key features ### Arabic dialect recognition Hamsa STT is optimized for Arabic speech: * **Automatic dialect detection**: Set `language` to `ar` and the model detects the dialect automatically * **Code-switching**: Natural handling of mixed Arabic-English speech * **Colloquial expressions**: Recognition of dialect-specific idioms and expressions ### Advanced transcription features * **Word-level timestamps**: Precise timing for each transcribed word — each segment includes word text plus start/end times * **Word highlight during playback**: In the Media Platform, the current word highlights in sync with playback; click any word to seek * **Speaker diarization**: Identification of different speakers in multi-speaker audio * **Automatic punctuation**: Natural punctuation and formatting * **SRT subtitle export**: Generate formatted subtitles with configurable line/duration options ### Flexible integration * **Batch API** (`/v1/jobs/transcribe`) — async transcription from media URLs with webhook delivery * **Realtime API** (`/v1/realtime/stt`) — synchronous transcription from base64-encoded audio * **WebSocket** (`/v1/realtime/ws`) — streaming transcription for real-time applications * **Media Platform** — web interface for upload, transcribe, and review ## API endpoints **Async — `/v1/jobs/transcribe`** Submit a media URL for transcription. Results delivered via webhook. Parameters: `mediaUrl`, `model`, `language`, `webhookUrl` **Sync — `/v1/realtime/stt`** Send base64-encoded audio, get transcription back directly. Parameters: `audioBase64`, `language`, `isEosEnabled` ## Models | Model ID | Best for | | --------------------------- | ------------------------------------------------------- | | `Hamsa-General-V2.0` | General-purpose — media, podcasts, pre-recorded content | | `Hamsa-Conversational-V1.0` | Conversational audio — meetings, calls, dialogues | ## Supported languages The API accepts two language codes: | Code | Language | | ---- | ------------------------------------- | | `ar` | Arabic (all dialects — auto-detected) | | `en` | English | Arabic dialect detection is automatic — you do not need to specify the specific dialect. Set `language` to `ar` and the model handles Egyptian, Gulf, Levantine, Iraqi, and other dialects. ## Use cases ### Media transcription Transcribe Arabic podcasts, videos, and media content: * Generate subtitles for videos (with SRT export) * Create searchable transcripts * Content analysis and indexing ### Voice agents Power real-time conversational AI: * Customer service voice agents * Live call transcription * Conversation analytics ### Meeting documentation Document Arabic meetings and interviews: * Automatic meeting minutes with speaker identification * Searchable archives * Compliance and record-keeping ### Content accessibility Make Arabic audio content accessible: * Closed captions for videos * Transcripts for audio content * Translation preparation ## Getting started Use the [Batch API](/speech-to-text/quickstart#batch-transcription) for pre-recorded media, the [Realtime API](/speech-to-text/quickstart#realtime-transcription-synchronous) for direct transcription, or the [WebSocket API](/websocket/websocket-api) for streaming. Use `Hamsa-General-V2.0` for general transcription or `Hamsa-Conversational-V1.0` for conversational audio. Provide a media URL (batch) or base64-encoded audio (realtime), and get your transcription with timestamps and speaker information. ## Next steps Build your first STT integration Real-time streaming transcription Tips for better transcription accuracy Use STT via web interface ## FAQ The Batch API (`/v1/jobs/transcribe`) is async — submit a media URL and receive results via webhook. Use it for pre-recorded files. The Realtime API (`/v1/realtime/stt`) accepts base64-encoded audio and returns the transcription directly. For streaming, use the WebSocket API. No. Set `language` to `ar` and the model automatically detects the specific dialect (Egyptian, Gulf, Levantine, etc.) and transcribes accordingly. Yes, the models handle speech that switches between Arabic and English, which is common in many Arabic-speaking regions. Use `Hamsa-General-V2.0` for general-purpose transcription of media and pre-recorded content. Use `Hamsa-Conversational-V1.0` for conversational audio like calls and meetings. Yes. Set `returnSrtFormat` to `true` in the Batch API request. You can customize subtitle formatting with `srtOptions`. See the [Quickstart](/speech-to-text/quickstart#srt-options) for details.