Skip to main content

Overview

The Hamsa Real-Time WebSocket API enables bidirectional streaming communication for Text-to-Speech (TTS) and Speech-to-Text (STT) operations. A single persistent connection can handle multiple requests without reconnecting.

Connection

Endpoint

wss://api.tryhamsa.com/v1/realtime/ws

Authentication

Authenticate using your API key via query parameter or header:
wss://api.tryhamsa.com/v1/realtime/ws?api_key=YOUR_API_KEY

Connection Response

Upon successful connection, the server sends:
{
  "type": "info",
  "payload": {
    "message": "Connected to realtime WebSocket server"
  }
}
Connections are automatically closed after 60 minutes of inactivity. The server sends ping frames every 30 seconds to keep connections alive.

Message Format

All messages follow this structure:
interface WebSocketMessage {
  type: "tts" | "stt" | "response" | "error" | "info" | "ack" | "end";
  payload?: object;
}
TypeDirectionDescription
ttsClient → ServerText-to-Speech request
sttClient → ServerSpeech-to-Text request
ackServer → ClientRequest acknowledgment
responseServer → ClientResponse data
endServer → ClientStream completion
errorServer → ClientError message
infoServer → ClientInformational message

Text-to-Speech (TTS)

Convert text to speech with streaming audio output.

Request

type
string
required
Must be "tts"
payload
object
required

Example Request

{
  "type": "tts",
  "payload": {
    "text": "مرحبا بك في خدمة همسة",
    "speaker": "speaker-1",
    "dialect": "modern",
    "languageId": "ar",
    "mulaw": false
  }
}

Response Flow

1

Acknowledgment

Server confirms the request was received:
{
  "type": "ack",
  "payload": {
    "message": "Real time text to speach connection establesh"
  }
}
2

Audio Stream

Server streams raw audio data as binary chunks. Buffer these chunks to reconstruct the complete audio file.
3

Stream End

Server signals completion:
{
  "type": "end",
  "payload": {
    "message": "End of TTS stream"
  }
}

Code Example

const ws = new WebSocket('wss://api.tryhamsa.com/v1/realtime/ws?api_key=YOUR_API_KEY');

const audioChunks = [];

ws.onopen = () => {
  ws.send(JSON.stringify({
    type: 'tts',
    payload: {
      text: 'مرحبا بك',
      speaker: 'speaker-1',
      dialect: 'modern',
      languageId: 'ar',
      mulaw: false
    }
  }));
};

ws.onmessage = (event) => {
  if (event.data instanceof Blob) {
    // Binary audio chunk
    audioChunks.push(event.data);
  } else {
    const message = JSON.parse(event.data);

    if (message.type === 'end') {
      // Combine all chunks into final audio
      const audioBlob = new Blob(audioChunks, { type: 'audio/wav' });
      const audioUrl = URL.createObjectURL(audioBlob);
      const audio = new Audio(audioUrl);
      audio.play();
    }
  }
};

Speech-to-Text (STT)

Transcribe audio to text.

Request

type
string
required
Must be "stt"
payload
object
required

Example Request (Base64)

{
  "type": "stt",
  "payload": {
    "audioBase64": "//NExAAAAAANIAcAPABEAEQAQABEAEQARABEA...",
    "language": "ar",
    "isEosEnabled": true,
    "eosThreshold": 0.3
  }
}

Example Request (Float Array)

{
  "type": "stt",
  "payload": {
    "audioList": [0.001, 0.0015, -0.002, 0.003, ...],
    "language": "ar",
    "isEosEnabled": true
  }
}

Response

The server sends the transcribed text directly as a plain string (not JSON):
مرحبا بك في خدمة همسة

Code Example

const ws = new WebSocket('wss://api.tryhamsa.com/v1/realtime/ws?api_key=YOUR_API_KEY');

ws.onopen = async () => {
  // Get audio from microphone
  const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
  const mediaRecorder = new MediaRecorder(stream);
  const chunks = [];

  mediaRecorder.ondataavailable = (e) => chunks.push(e.data);

  mediaRecorder.onstop = async () => {
    const blob = new Blob(chunks);
    const buffer = await blob.arrayBuffer();
    const base64 = btoa(String.fromCharCode(...new Uint8Array(buffer)));

    ws.send(JSON.stringify({
      type: 'stt',
      payload: {
        audioBase64: base64,
        language: 'ar',
        isEosEnabled: true,
        eosThreshold: 0.3
      }
    }));
  };

  mediaRecorder.start();
  setTimeout(() => mediaRecorder.stop(), 3000); // Record 3 seconds
};

ws.onmessage = (event) => {
  if (typeof event.data === 'string') {
    try {
      const json = JSON.parse(event.data);
      if (json.type === 'error') {
        console.error('Error:', json.payload.message);
      }
    } catch {
      // Plain text transcription result
      console.log('Transcription:', event.data);
    }
  }
};

Error Handling

Error Response Format

{
  "type": "error",
  "payload": {
    "message": "Error description"
  }
}

WebSocket Close Codes

CodeDescription
4001Authentication failed - invalid or missing API key
4003Insufficient funds - project wallet balance is depleted
4500Internal authentication error
1000Connection closed due to inactivity (60 min timeout)
1001Server shutting down

Common Errors

ErrorCause
Missing API key in headers or query parametersNo API key provided
API key is invalid or expiredInvalid API key
User account is inactive or not foundAccount issue
Project is inactive or not foundProject issue
Insufficient funds in walletWallet balance is zero or negative
Invalid message format: missing type or payloadMalformed message
Unsupported message type: [type]Unknown message type
Invalid payload for message type: ttsTTS validation failed
Invalid payload for message type: sttSTT validation failed
Voice not owned by userAttempting to use unauthorized cloned voice

Rate Limiting

  • Limit: 100 requests per 60 seconds per API key
  • Exceeding the limit returns: Rate limit exceeded for this API key

Best Practices

  • Reuse WebSocket connections for multiple requests
  • Handle reconnection logic for network interruptions
  • Listen for close events and reconnect when needed
  • Keep text under 2000 characters per request
  • For longer content, split into sentences and make sequential requests
  • Buffer audio chunks before playback for smooth audio
  • Use 16kHz sample rate for best results
  • Enable end-of-speech detection for real-time transcription
  • Send audio in mono format
  • Always handle the error message type
  • Monitor close codes to distinguish between errors and normal closures
  • Implement exponential backoff for reconnection attempts