Skip to main content
Version: 2.0.0

Streaming (Websocket)

note

Websocket API is only available for Business plan users. If you're running into trouble, upgrade to a Business plan or higher on the billing page.

  1. Connect to Websocket URL
  2. Send TTS Request with JSON params
  3. Receive audio data in JSON or binary format
  4. Receive a final termination message

Synthesis Request

WebSocket URL

wss://websocket.cluster.resemble.ai/stream

Request Params

Send a JSON message to request speech synthesis. The message format is as follows:

{
"voice_uuid": "<voice_uuid>",
"project_uuid": "<project_uuid>",
"data": "<text | ssml>",
"binary_response": false, // Optional, defaults to false for JSON response
"request_id": 0, // Optional, auto-incremented if not specified
// Additional optional parameters as needed
}
AttributeTypeRequiredDescription
voice_uuidstringYesThe voice to synthesize the text in.
project_uuidstringYesThe project to save the data to.
datastringYesThe text or SSML to synthesize. Maximum length of 3000 characters (not including SSML).
request_idintNoOptional numerical identifier for the request. Returned with each response with increasing integers starting with 0
binary_responseboolNoDefaults to false. If true, returns audio data in binary format (MP3 or WAV), suitable for direct playback. If false, returns audio data in JSON frames with base64 encoding.
output_formatstringNoThe output format of the produced audio. Either "wav", or "mp3".
sample_rateintegerNoThe sample rate of the produced audio. Either 8000, 16000, 22050, 32000, or 44100
precisionstringNoThe bit depth of the generated audio. One of the following values: PCM_32, PCM_24, PCM_16, or MULAW. Default is PCM_32.
no_audio_headerboolNoDefaults to false. If true, the audio header will not be included in the binary WAV file as response. If false, the audio header will be included.

Audio Output

JSON Response Format

When binary_response is set to false, the server sends multiple audio chunks as JSON objects:

{
"type": "audio",
"audio_content": <base64_encoded_audio>,
"audio_timestamps": {
"graph_chars": ["H", "e"] OR null,
"graph_times": [[0.0374, 0.1247], [0.0873, 0.1746]] OR null,
"phon_chars": ["h","ˈe",] OR null,
"phon_times": [[0.0374, 0.1247], [0.0873, 0.1746]] OR null,
},
"sample_rate": 32000,
"request_id": 0
}

Binary Response Format

When binary_response is set to true, audio chunks are sent as contiguous bytes of a WAV or MP3 file:

// Binary data stream

Termination Message

Indicates the completion of the audio stream:

{
"type": "audio_end",
"request_id": 0
}

Error Handling

note

Websockets API is only available for Business plan users. If you're running into trouble, upgrade to a Business plan or higher on the billing page.

Unrecoverable Errors

Errors occurring during the connection handshake, leading to connection failure:

{
"type": "error",
"success": false,
"error_name": "ConnectionFailure",
"message": "Failed to establish a connection.",
"status_code": 401 // Example status code
}

Recoverable Errors

Errors related to synthesis requests that do not interrupt the ongoing connection:

{
"type": "error",
"success": false,
"error_name": "BadJSON",
"error_params": {"explanation": "Provide your query to synthesize as text or SSML in the 'data' field"},
"message": "Invalid JSON: Provide your query to synthesize as text or SSML in the 'data' field",
"status_code": 400
}