> ## Documentation Index
> Fetch the complete documentation index at: https://docs.tryhamsa.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Speech to Text

> Transcribe Arabic and English speech into accurate text with Hamsa STT

Hamsa Speech to Text (STT) transcribes Arabic speech across multiple dialects into text with word-level timestamps and speaker identification. Whether you're transcribing media content, building voice applications, or documenting conversations, Hamsa STT delivers high-accuracy Arabic speech recognition.

## Overview

<CardGroup cols={2}>
  <Card title="API Reference" icon="code" href="/api-reference/introduction">
    Technical API documentation for developers
  </Card>

  <Card title="Quickstart" icon="rocket" href="/speech-to-text/quickstart">
    Get started with STT in minutes
  </Card>
</CardGroup>

## Key features

### Arabic dialect recognition

Hamsa STT is optimized for Arabic speech:

* **Automatic dialect detection**: Set `language` to `ar` and the model detects the dialect automatically
* **Code-switching**: Natural handling of mixed Arabic-English speech
* **Colloquial expressions**: Recognition of dialect-specific idioms and expressions

### Advanced transcription features

* **Word-level timestamps**: Precise timing for each transcribed word — each segment includes word text plus start/end times
* **Word highlight during playback**: In the Media Platform, the current word highlights in sync with playback; click any word to seek
* **Speaker diarization**: Identification of different speakers in multi-speaker audio
* **Automatic punctuation**: Natural punctuation and formatting
* **SRT subtitle export**: Generate formatted subtitles with configurable line/duration options

### Flexible integration

* **Batch API** (`/v1/jobs/transcribe`) — async transcription from media URLs with webhook delivery
* **Realtime API** (`/v1/realtime/stt`) — synchronous transcription from base64-encoded audio
* **WebSocket** (`/v1/realtime/ws`) — streaming transcription for real-time applications
* **Media Platform** — web interface for upload, transcribe, and review

## API endpoints

<CardGroup cols={2}>
  <Card title="Batch API" icon="clock">
    **Async — `/v1/jobs/transcribe`**

    Submit a media URL for transcription. Results delivered via webhook.

    Parameters: `mediaUrl`, `model`, `language`, `webhookUrl`
  </Card>

  <Card title="Realtime API" icon="bolt">
    **Sync — `/v1/realtime/stt`**

    Send base64-encoded audio, get transcription back directly.

    Parameters: `audioBase64`, `language`, `isEosEnabled`
  </Card>
</CardGroup>

## Models

| Model ID                    | Best for                                                |
| --------------------------- | ------------------------------------------------------- |
| `Hamsa-General-V2.0`        | General-purpose — media, podcasts, pre-recorded content |
| `Hamsa-Conversational-V1.0` | Conversational audio — meetings, calls, dialogues       |

## Supported languages

The API accepts two language codes:

| Code | Language                              |
| ---- | ------------------------------------- |
| `ar` | Arabic (all dialects — auto-detected) |
| `en` | English                               |

Arabic dialect detection is automatic — you do not need to specify the specific dialect. Set `language` to `ar` and the model handles Egyptian, Gulf, Levantine, Iraqi, and other dialects.

## Use cases

### Media transcription

Transcribe Arabic podcasts, videos, and media content:

* Generate subtitles for videos (with SRT export)
* Create searchable transcripts
* Content analysis and indexing

### Voice agents

Power real-time conversational AI:

* Customer service voice agents
* Live call transcription
* Conversation analytics

### Meeting documentation

Document Arabic meetings and interviews:

* Automatic meeting minutes with speaker identification
* Searchable archives
* Compliance and record-keeping

### Content accessibility

Make Arabic audio content accessible:

* Closed captions for videos
* Transcripts for audio content
* Translation preparation

## Getting started

<Steps>
  <Step title="Choose your integration">
    Use the [Batch API](/speech-to-text/quickstart#batch-transcription) for pre-recorded media, the [Realtime API](/speech-to-text/quickstart#realtime-transcription-synchronous) for direct transcription, or the [WebSocket API](/websocket/websocket-api) for streaming.
  </Step>

  <Step title="Select a model">
    Use `Hamsa-General-V2.0` for general transcription or `Hamsa-Conversational-V1.0` for conversational audio.
  </Step>

  <Step title="Submit your audio">
    Provide a media URL (batch) or base64-encoded audio (realtime), and get your transcription with timestamps and speaker information.
  </Step>
</Steps>

## Next steps

<CardGroup cols={2}>
  <Card title="Quickstart Guide" icon="rocket" href="/speech-to-text/quickstart">
    Build your first STT integration
  </Card>

  <Card title="WebSocket API" icon="bolt" href="/websocket/websocket-api">
    Real-time streaming transcription
  </Card>

  <Card title="Improving Accuracy" icon="chart-line" href="/speech-to-text/guides/improving-accuracy">
    Tips for better transcription accuracy
  </Card>

  <Card title="Media Platform" icon="play" href="/media/speech-to-text/overview">
    Use STT via web interface
  </Card>
</CardGroup>

## FAQ

<AccordionGroup>
  <Accordion title="What's the difference between the Batch API and Realtime API?">
    The Batch API (`/v1/jobs/transcribe`) is async — submit a media URL and receive results via webhook. Use it for pre-recorded files. The Realtime API (`/v1/realtime/stt`) accepts base64-encoded audio and returns the transcription directly. For streaming, use the WebSocket API.
  </Accordion>

  <Accordion title="Do I need to specify the Arabic dialect?">
    No. Set `language` to `ar` and the model automatically detects the specific dialect (Egyptian, Gulf, Levantine, etc.) and transcribes accordingly.
  </Accordion>

  <Accordion title="Can the model handle Arabic-English code-switching?">
    Yes, the models handle speech that switches between Arabic and English, which is common in many Arabic-speaking regions.
  </Accordion>

  <Accordion title="Which model should I use?">
    Use `Hamsa-General-V2.0` for general-purpose transcription of media and pre-recorded content. Use `Hamsa-Conversational-V1.0` for conversational audio like calls and meetings.
  </Accordion>

  <Accordion title="Can I get SRT subtitles?">
    Yes. Set `returnSrtFormat` to `true` in the Batch API request. You can customize subtitle formatting with `srtOptions`. See the [Quickstart](/speech-to-text/quickstart#srt-options) for details.
  </Accordion>
</AccordionGroup>
