Overview
The Hamsa Real-Time WebSocket API enables bidirectional streaming communication for Text-to-Speech (TTS) and Speech-to-Text (STT) operations. A single persistent connection can handle multiple requests without reconnecting.
Connection
Endpoint
wss://api.tryhamsa.com/v1/realtime/ws
Authentication
Authenticate using your API key via query parameter or header:
wss://api.tryhamsa.com/v1/realtime/ws?api_key =YOUR_API_KEY
Connection Response
Upon successful connection, the server sends:
{
"type" : "info" ,
"payload" : {
"message" : "Connected to realtime WebSocket server"
}
}
Connections are automatically closed after 60 minutes of inactivity. The server sends ping frames every 30 seconds to keep connections alive.
All messages follow this structure:
interface WebSocketMessage {
type : "tts" | "stt" | "response" | "error" | "info" | "ack" | "end" ;
payload ?: object ;
}
Type Direction Description ttsClient → Server Text-to-Speech request sttClient → Server Speech-to-Text request ackServer → Client Request acknowledgment responseServer → Client Response data endServer → Client Stream completion errorServer → Client Error message infoServer → Client Informational message
Text-to-Speech (TTS)
Convert text to speech with streaming audio output.
Request
The text to synthesize. Maximum 2000 characters.
Speaker ID. Use a UUID for custom cloned voices or a pre-built speaker name.
Dialect identifier (e.g., "modern").
Language code. Defaults to "ar" (Arabic).
Whether to use mu-law audio encoding.
Example Request
{
"type" : "tts" ,
"payload" : {
"text" : "مرحبا بك في خدمة همسة" ,
"speaker" : "speaker-1" ,
"dialect" : "modern" ,
"languageId" : "ar" ,
"mulaw" : false
}
}
Response Flow
Acknowledgment
Server confirms the request was received: {
"type" : "ack" ,
"payload" : {
"message" : "Real time text to speach connection establesh"
}
}
Audio Stream
Server streams raw audio data as binary chunks. Buffer these chunks to reconstruct the complete audio file.
Stream End
Server signals completion: {
"type" : "end" ,
"payload" : {
"message" : "End of TTS stream"
}
}
Code Example
const ws = new WebSocket ( 'wss://api.tryhamsa.com/v1/realtime/ws?api_key=YOUR_API_KEY' );
const audioChunks = [];
ws . onopen = () => {
ws . send ( JSON . stringify ({
type: 'tts' ,
payload: {
text: 'مرحبا بك' ,
speaker: 'speaker-1' ,
dialect: 'modern' ,
languageId: 'ar' ,
mulaw: false
}
}));
};
ws . onmessage = ( event ) => {
if ( event . data instanceof Blob ) {
// Binary audio chunk
audioChunks . push ( event . data );
} else {
const message = JSON . parse ( event . data );
if ( message . type === 'end' ) {
// Combine all chunks into final audio
const audioBlob = new Blob ( audioChunks , { type: 'audio/wav' });
const audioUrl = URL . createObjectURL ( audioBlob );
const audio = new Audio ( audioUrl );
audio . play ();
}
}
};
Speech-to-Text (STT)
Transcribe audio to text.
Request
Base64-encoded audio data. Either audioBase64 or audioList is required.
Float32 array of PCM audio samples (16kHz, mono). Either audioBase64 or audioList is required.
Language code for transcription. Defaults to "ar" (Arabic).
Enable end-of-speech detection.
Threshold for end-of-speech detection (0.0 to 1.0).
Example Request (Base64)
{
"type" : "stt" ,
"payload" : {
"audioBase64" : "//NExAAAAAANIAcAPABEAEQAQABEAEQARABEA..." ,
"language" : "ar" ,
"isEosEnabled" : true ,
"eosThreshold" : 0.3
}
}
Example Request (Float Array)
{
"type" : "stt" ,
"payload" : {
"audioList" : [ 0.001 , 0.0015 , -0.002 , 0.003 , ... ],
"language" : "ar" ,
"isEosEnabled" : true
}
}
Response
The server sends the transcribed text directly as a plain string (not JSON):
Code Example
const ws = new WebSocket ( 'wss://api.tryhamsa.com/v1/realtime/ws?api_key=YOUR_API_KEY' );
ws . onopen = async () => {
// Get audio from microphone
const stream = await navigator . mediaDevices . getUserMedia ({ audio: true });
const mediaRecorder = new MediaRecorder ( stream );
const chunks = [];
mediaRecorder . ondataavailable = ( e ) => chunks . push ( e . data );
mediaRecorder . onstop = async () => {
const blob = new Blob ( chunks );
const buffer = await blob . arrayBuffer ();
const base64 = btoa ( String . fromCharCode ( ... new Uint8Array ( buffer )));
ws . send ( JSON . stringify ({
type: 'stt' ,
payload: {
audioBase64: base64 ,
language: 'ar' ,
isEosEnabled: true ,
eosThreshold: 0.3
}
}));
};
mediaRecorder . start ();
setTimeout (() => mediaRecorder . stop (), 3000 ); // Record 3 seconds
};
ws . onmessage = ( event ) => {
if ( typeof event . data === 'string' ) {
try {
const json = JSON . parse ( event . data );
if ( json . type === 'error' ) {
console . error ( 'Error:' , json . payload . message );
}
} catch {
// Plain text transcription result
console . log ( 'Transcription:' , event . data );
}
}
};
Error Handling
{
"type" : "error" ,
"payload" : {
"message" : "Error description"
}
}
WebSocket Close Codes
Code Description 4001Authentication failed - invalid or missing API key 4003Insufficient funds - project wallet balance is depleted 4500Internal authentication error 1000Connection closed due to inactivity (60 min timeout) 1001Server shutting down
Common Errors
Error Cause Missing API key in headers or query parametersNo API key provided API key is invalid or expiredInvalid API key User account is inactive or not foundAccount issue Project is inactive or not foundProject issue Insufficient funds in walletWallet balance is zero or negative Invalid message format: missing type or payloadMalformed message Unsupported message type: [type]Unknown message type Invalid payload for message type: ttsTTS validation failed Invalid payload for message type: sttSTT validation failed Voice not owned by userAttempting to use unauthorized cloned voice
Rate Limiting
Limit : 100 requests per 60 seconds per API key
Exceeding the limit returns: Rate limit exceeded for this API key
Best Practices
Reuse WebSocket connections for multiple requests
Handle reconnection logic for network interruptions
Listen for close events and reconnect when needed
Keep text under 2000 characters per request
For longer content, split into sentences and make sequential requests
Buffer audio chunks before playback for smooth audio
Use 16kHz sample rate for best results
Enable end-of-speech detection for real-time transcription
Send audio in mono format
Always handle the error message type
Monitor close codes to distinguish between errors and normal closures
Implement exponential backoff for reconnection attempts