Skip to main content
Version: 2.0.0

Receiving Audio Data

note

Websocket API is only available for Business plan users. If you're running into trouble, upgrade to a Business plan or higher on the billing page.

  1. Connect to Websocket URL
  2. Send TTS Request with JSON params
  3. Receive audio data in JSON or binary format
  4. Receive a final termination message

Synthesis Request

WebSocket URL

wss://websocket.cluster.resemble.ai/stream

Request Params

Send a JSON message to request speech synthesis. The message format is as follows:

{
"voice_uuid": "<voice_uuid>",
"project_uuid": "<project_uuid>",
"data": "<text | ssml>",
"binary_response": false, // Optional, defaults to false for JSON response
"request_id": 0, // Optional, auto-incremented if not specified
// Additional optional parameters as needed
}
AttributeTypeRequiredDescription
voice_uuidstringYesThe voice to synthesize the text in.
project_uuidstringYesThe project to save the data to.
datastringYesThe text or SSML to synthesize.
request_idintNoOptional numerical identifier for the request. Returned with each response with increasing integers starting with 0
binary_responseboolNoDefaults to false. If true, returns audio data in binary format (MP3 or WAV), suitable for direct playback. If false, returns audio data in JSON frames with base64 encoding.
output_formatstringNoThe output format of the produced audio. Either "wav", or "mp3".
sample_rateintegerNoThe sample rate of the produced audio. Either 8000, 16000, 22050, 32000, or 44100
precisionstringNoThe bit depth of the generated audio. One of the following values: PCM_32, PCM_24, PCM_16, or MULAW. Default is PCM_32.
no_audio_headerboolNoDefaults to false. If true, the audio header will not be included in the binary WAV file as response. If false, the audio header will be included.

Audio Output

JSON Response Format

When binary_response is set to false, the server sends multiple audio chunks as JSON objects:

{
"type": "audio",
"audio_content": <base64_encoded_audio>,
"audio_timestamps": {
"graph_chars": ["H", "e"] OR null,
"graph_times": [[0.0374, 0.1247], [0.0873, 0.1746]] OR null,
"phon_chars": ["h","ˈe",] OR null,
"phon_times": [[0.0374, 0.1247], [0.0873, 0.1746]] OR null,
},
"sample_rate": 32000,
"request_id": 0
}

Binary Response Format

When binary_response is set to true, audio chunks are sent as contiguous bytes of a WAV or MP3 file:

// Binary data stream

Termination Message

Indicates the completion of the audio stream:

{
"type": "audio_end",
"request_id": 0
}