Real-Time WebSocket API

Overview

The Hamsa Real-Time WebSocket API enables bidirectional streaming communication for Text-to-Speech (TTS) and Speech-to-Text (STT) operations. A single persistent connection can handle multiple requests without reconnecting.

Connection

Endpoint

wss://api.tryhamsa.com/v1/realtime/ws

Authentication

Authenticate using your API key via query parameter or header:

wss://api.tryhamsa.com/v1/realtime/ws?api_key=YOUR_API_KEY

Connection Response

Upon successful connection, the server sends:

{
  "type": "info",
  "payload": {
    "message": "Connected to realtime WebSocket server"
  }
}

Connections are automatically closed after 60 minutes of inactivity. The server sends ping frames every 30 seconds to keep connections alive.

Message Format

All messages follow this structure:

interface WebSocketMessage {
  type: "tts" | "stt" | "response" | "error" | "info" | "ack" | "end";
  payload?: object;
}

Type	Direction	Description
`tts`	Client → Server	Text-to-Speech request
`stt`	Client → Server	Speech-to-Text request
`ack`	Server → Client	Request acknowledgment
`response`	Server → Client	Response data
`end`	Server → Client	Stream completion
`error`	Server → Client	Error message
`info`	Server → Client	Informational message

Text-to-Speech (TTS)

Convert text to speech with streaming audio output.

Request

type

string

required

Must be "tts"

payload

object

required

Show payload properties

text

string

required

The text to synthesize. Maximum 2000 characters.

speaker

string

required

Speaker ID. Use a UUID for custom cloned voices or a pre-built speaker name.

dialect

string

Dialect identifier (e.g., "modern").

languageId

string

default:"ar"

Language code. Defaults to "ar" (Arabic).

mulaw

boolean

default:false

Whether to use mu-law audio encoding.

Example Request

{
  "type": "tts",
  "payload": {
    "text": "مرحبا بك في خدمة همسة",
    "speaker": "speaker-1",
    "dialect": "modern",
    "languageId": "ar",
    "mulaw": false
  }
}

Response Flow

Acknowledgment

Server confirms the request was received:

{
  "type": "ack",
  "payload": {
    "message": "Real time text to speach connection establesh"
  }
}

Audio Stream

Server streams raw audio data as binary chunks. Buffer these chunks to reconstruct the complete audio file.

Stream End

Server signals completion:

{
  "type": "end",
  "payload": {
    "message": "End of TTS stream"
  }
}

Code Example

const ws = new WebSocket('wss://api.tryhamsa.com/v1/realtime/ws?api_key=YOUR_API_KEY');

const audioChunks = [];

ws.onopen = () => {
  ws.send(JSON.stringify({
    type: 'tts',
    payload: {
      text: 'مرحبا بك',
      speaker: 'speaker-1',
      dialect: 'modern',
      languageId: 'ar',
      mulaw: false
    }
  }));
};

ws.onmessage = (event) => {
  if (event.data instanceof Blob) {
    // Binary audio chunk
    audioChunks.push(event.data);
  } else {
    const message = JSON.parse(event.data);

    if (message.type === 'end') {
      // Combine all chunks into final audio
      const audioBlob = new Blob(audioChunks, { type: 'audio/wav' });
      const audioUrl = URL.createObjectURL(audioBlob);
      const audio = new Audio(audioUrl);
      audio.play();
    }
  }
};

Speech-to-Text (STT)

Transcribe audio to text.

Request

type

string

required

Must be "stt"

payload

object

required

Show payload properties

audioBase64

string

Base64-encoded audio data. Either audioBase64 or audioList is required.

audioList

number[]

Float32 array of PCM audio samples (16kHz, mono). Either audioBase64 or audioList is required.

language

string

default:"ar"

Language code for transcription. Defaults to "ar" (Arabic).

isEosEnabled

boolean

default:true

Enable end-of-speech detection.

eosThreshold

number

Threshold for end-of-speech detection (0.0 to 1.0).

Example Request (Base64)

{
  "type": "stt",
  "payload": {
    "audioBase64": "//NExAAAAAANIAcAPABEAEQAQABEAEQARABEA...",
    "language": "ar",
    "isEosEnabled": true,
    "eosThreshold": 0.3
  }
}

Example Request (Float Array)

{
  "type": "stt",
  "payload": {
    "audioList": [0.001, 0.0015, -0.002, 0.003, ...],
    "language": "ar",
    "isEosEnabled": true
  }
}

Response

The server sends the transcribed text directly as a plain string (not JSON):

مرحبا بك في خدمة همسة

Code Example

const ws = new WebSocket('wss://api.tryhamsa.com/v1/realtime/ws?api_key=YOUR_API_KEY');

ws.onopen = async () => {
  // Get audio from microphone
  const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
  const mediaRecorder = new MediaRecorder(stream);
  const chunks = [];

  mediaRecorder.ondataavailable = (e) => chunks.push(e.data);

  mediaRecorder.onstop = async () => {
    const blob = new Blob(chunks);
    const buffer = await blob.arrayBuffer();
    const base64 = btoa(String.fromCharCode(...new Uint8Array(buffer)));

    ws.send(JSON.stringify({
      type: 'stt',
      payload: {
        audioBase64: base64,
        language: 'ar',
        isEosEnabled: true,
        eosThreshold: 0.3
      }
    }));
  };

  mediaRecorder.start();
  setTimeout(() => mediaRecorder.stop(), 3000); // Record 3 seconds
};

ws.onmessage = (event) => {
  if (typeof event.data === 'string') {
    try {
      const json = JSON.parse(event.data);
      if (json.type === 'error') {
        console.error('Error:', json.payload.message);
      }
    } catch {
      // Plain text transcription result
      console.log('Transcription:', event.data);
    }
  }
};

Error Handling

Error Response Format

{
  "type": "error",
  "payload": {
    "message": "Error description"
  }
}

WebSocket Close Codes

Code	Description
`4001`	Authentication failed - invalid or missing API key
`4003`	Insufficient funds - project wallet balance is depleted
`4500`	Internal authentication error
`1000`	Connection closed due to inactivity (60 min timeout)
`1001`	Server shutting down

Common Errors

Error	Cause
`Missing API key in headers or query parameters`	No API key provided
`API key is invalid or expired`	Invalid API key
`User account is inactive or not found`	Account issue
`Project is inactive or not found`	Project issue
`Insufficient funds in wallet`	Wallet balance is zero or negative
`Invalid message format: missing type or payload`	Malformed message
`Unsupported message type: [type]`	Unknown message type
`Invalid payload for message type: tts`	TTS validation failed
`Invalid payload for message type: stt`	STT validation failed
`Voice not owned by user`	Attempting to use unauthorized cloned voice

Rate Limiting

Limit: 100 requests per 60 seconds per API key
Exceeding the limit returns: Rate limit exceeded for this API key

Best Practices

Connection Management

Reuse WebSocket connections for multiple requests
Handle reconnection logic for network interruptions
Listen for close events and reconnect when needed

TTS Optimization

Keep text under 2000 characters per request
For longer content, split into sentences and make sequential requests
Buffer audio chunks before playback for smooth audio

STT Optimization

Use 16kHz sample rate for best results
Enable end-of-speech detection for real-time transcription
Send audio in mono format

Error Handling

Always handle the error message type
Monitor close codes to distinguish between errors and normal closures
Implement exponential backoff for reconnection attempts

overview

WebSocket Playgrounds

Real-Time WebSocket API

Overview

Connection

Endpoint

Authentication

Connection Response

Message Format

Text-to-Speech (TTS)

Request

Example Request

Response Flow

Code Example

Speech-to-Text (STT)

Request

Example Request (Base64)

Example Request (Float Array)

Response

Code Example

Error Handling

Error Response Format

WebSocket Close Codes

Common Errors

Rate Limiting

Best Practices

overview

WebSocket Playgrounds

​Overview

​Connection

​Endpoint

​Authentication

​Connection Response

​Message Format

​Text-to-Speech (TTS)

​Request

​Example Request

​Response Flow

​Code Example

​Speech-to-Text (STT)

​Request

​Example Request (Base64)

​Example Request (Float Array)

​Response

​Code Example

​Error Handling

​Error Response Format

​WebSocket Close Codes

​Common Errors

​Rate Limiting

​Best Practices

Overview

Connection

Endpoint

Authentication

Connection Response

Message Format

Text-to-Speech (TTS)

Request

Example Request

Response Flow

Code Example

Speech-to-Text (STT)

Request

Example Request (Base64)

Example Request (Float Array)

Response

Code Example

Error Handling

Error Response Format

WebSocket Close Codes

Common Errors

Rate Limiting

Best Practices