Receiving Audio Data
note
Websocket API is only available for Business plan users. If you're running into trouble, upgrade to a Business plan or higher on the billing page.
- Connect to Websocket URL
- Send TTS Request with JSON params
- Receive audio data in JSON or binary format
- Receive a final termination message
Synthesis Request
WebSocket URL
wss://websocket.cluster.resemble.ai/stream
Request Params
Send a JSON message to request speech synthesis. The message format is as follows:
{
"voice_uuid": "<voice_uuid>",
"project_uuid": "<project_uuid>",
"data": "<text | ssml>",
"binary_response": false, // Optional, defaults to false for JSON response
"request_id": 0, // Optional, auto-incremented if not specified
// Additional optional parameters as needed
}
Attribute | Type | Required | Description |
---|---|---|---|
voice_uuid | string | Yes | The voice to synthesize the text in. |
project_uuid | string | Yes | The project to save the data to. |
data | string | Yes | The text or SSML to synthesize. |
request_id | int | No | Optional numerical identifier for the request. Returned with each response with increasing integers starting with 0 |
binary_response | bool | No | Defaults to false . If true , returns audio data in binary format (MP3 or WAV), suitable for direct playback. If false , returns audio data in JSON frames with base64 encoding. |
output_format | string | No | The output format of the produced audio. Either "wav" , or "mp3" . |
sample_rate | integer | No | The sample rate of the produced audio. Either 8000 , 16000 , 22050 , 32000 , or 44100 |
precision | string | No | The bit depth of the generated audio. One of the following values: PCM_32, PCM_24, PCM_16, or MULAW. Default is PCM_32. |
no_audio_header | bool | No | Defaults to false . If true , the audio header will not be included in the binary WAV file as response. If false , the audio header will be included. |
Audio Output
JSON Response Format
When binary_response
is set to false
, the server sends multiple audio chunks as JSON objects:
{
"type": "audio",
"audio_content": <base64_encoded_audio>,
"audio_timestamps": {
"graph_chars": ["H", "e"] OR null,
"graph_times": [[0.0374, 0.1247], [0.0873, 0.1746]] OR null,
"phon_chars": ["h","ˈe",] OR null,
"phon_times": [[0.0374, 0.1247], [0.0873, 0.1746]] OR null,
},
"sample_rate": 32000,
"request_id": 0
}
Binary Response Format
When binary_response
is set to true
, audio chunks are sent as contiguous bytes of a WAV or MP3 file:
// Binary data stream
Termination Message
Indicates the completion of the audio stream:
{
"type": "audio_end",
"request_id": 0
}