Version: 2.0.0

Streaming (Websocket)

Connect to Websocket URL
Send TTS Request with JSON params
Receive audio data in JSON or binary format
Receive a final termination message

Synthesis Request

WebSocket URL

wss://websocket.cluster.resemble.ai/stream

Request Params

Send a JSON message to request speech synthesis. The message format is as follows:

{
  "voice_uuid": "<voice_uuid>",
  "project_uuid": "<project_uuid>",
  "data": "<text | ssml>",
  "binary_response": false,  // Optional, defaults to false for JSON response
  "request_id": 0,           // Optional, auto-incremented if not specified
  // Additional optional parameters as needed
}

Attribute	Type	Required	Description
voice_uuid	string	Yes	The voice to synthesize the text in.
project_uuid	string	Yes	The project to save the data to.
data	string	Yes	The text or SSML to synthesize. Maximum length of `3000` characters (not including SSML).
request_id	int	No	Optional numerical identifier for the request. Returned with each response with increasing integers starting with 0
binary_response	bool	No	Defaults to `false`. If `true`, returns audio data in binary format (MP3 or WAV), suitable for direct playback. If `false`, returns audio data in JSON frames with base64 encoding.
output_format	string	No	The output format of the produced audio. Either `"wav"`, or `"mp3"`.
sample_rate	integer	No	The sample rate of the produced audio. Either `8000`, `16000`, `22050`, `32000`, or `44100`
precision	string	No	The bit depth of the generated audio. One of the following values: PCM_32, PCM_24, PCM_16, or MULAW. Default is PCM_32.
no_audio_header	bool	No	Defaults to `false`. If `true`, the audio header will not be included in the binary WAV file as response. If `false`, the audio header will be included.

Audio Output

JSON Response Format

When binary_response is set to false, the server sends multiple audio chunks as JSON objects:

{
    "type": "audio",
    "audio_content": <base64_encoded_audio>,
    "audio_timestamps": {
        "graph_chars": ["H", "e"] OR null,
        "graph_times": [[0.0374, 0.1247], [0.0873, 0.1746]] OR null,
        "phon_chars": ["h","ˈe",] OR null,
        "phon_times": [[0.0374, 0.1247], [0.0873, 0.1746]] OR null,
    },
    "sample_rate": 32000,
    "request_id": 0
}

Binary Response Format

When binary_response is set to true, audio chunks are sent as contiguous bytes of a WAV or MP3 file:

// Binary data stream

Termination Message

Indicates the completion of the audio stream:

{
  "type": "audio_end",
  "request_id": 0
}

Error Handling

note

Websockets API is only available for Business plan users. If you're running into trouble, upgrade to a Business plan or higher on the billing page.

Unrecoverable Errors

Errors occurring during the connection handshake, leading to connection failure:

{
  "type": "error",
  "success": false,
  "error_name": "ConnectionFailure",
  "message": "Failed to establish a connection.",
  "status_code": 401  // Example status code
}

Recoverable Errors

Errors related to synthesis requests that do not interrupt the ongoing connection:

{
  "type": "error",
  "success": false,
  "error_name": "BadJSON",
  "error_params": {"explanation": "Provide your query to synthesize as text or SSML in the 'data' field"},
  "message": "Invalid JSON: Provide your query to synthesize as text or SSML in the 'data' field",
  "status_code": 400
}

Streaming (Websocket)

Synthesis Request​

WebSocket URL​

Request Params​

Audio Output​

JSON Response Format​

Binary Response Format​

Termination Message​