Skip to main content
Version: 2.0.0

Low Latency Synthesis (Recommended)

Overview

When it comes to time-sensitive content delivery, using streaming synthesis is the fastest way to achieve the lowest time-to-first-sound. For documentation on streaming synthesis, see "Stream a clip".

For some applications, streaming is not an option. In these cases, the fastest time-to-first-sound can be achieved by sending synchronous requests directly to our synthesis servers. We call this low-latency synthesis (LLS) or direct synthesis. Summarized below are the steps required to use this API:

  • Make a request to the low latency synthesis endpoint. (Note: this is not the usual https://app.resemble.ai/api/... URL.)
  • Decode the base64 “audio_content” attribute sent back in the response.
  • Use the audio data.

HTTP Request

1 2 3 4 5 6 7 8 9 10 11 12 curl --request POST "YOUR_SYNTH_ENDPOINT" -H "Authorization: Bearer YOUR_API_TOKEN" -H "Content-Type: application/json" -H "Accept-Encoding: gzip, deflate, br" --data '{ "voice_uuid": <Voice to synthesize in>, "project_uuid": <Project to save to>, "title": <Title of the clip>, "data": <Text to synthesize>, "precision": "MULAW|PCM_16|PCM_24|PCM_32 (default)" "output_format": "mp3|wav (default)" }'

Request Headers

HeaderValueDescription
AuthorizationBearer YOUR_API_TOKENAPI token can be obtained by logging into the Resemble web application and navigating to the API section.
Accept-Encodinggzip, deflate, brEither one of gzip, deflate, or br depending on the decompression algorithms your application supports. Omitting the Accept-Encoding header will disable compression.

Request Body

AttributeTypeDescription
voice_uuidstringThe voice to synthesize the text in.
project_uuidstringThe project to save the data to.
titlestringThe title of the clip. This is optional, default is to name the clip Low Latency Synthesis {some-uuid}
datastringThe text or SSML to synthesize.
precisionstringThe bit-depth of the generated wav file (if using wav as the response type). Either MULAW, PCM_16, PCM_24, or PCM_32 (default).
output_formatstringThe output format of the produced audio. Either wav, or mp3.
sample_rateintegerThe sample rate of the produced audio. Either 8000, 16000, 22050, 32000, or 44100

HTTP Response

{
"audio_content": <base64 encoded string of the raw audio bytes>,
"audio_timestamps": {
"graph_chars": string[],
"graph_times": float[][],
"phon_chars": string[],
"phon-times": float[][],
},
"duration": float,
"issues": string[],
"output_format": string,
"sample_rate": float,
"success": boolean,
"synth_duration": float,
"title": string|null
}

Response Body

AttributeTypeDescription
audio_contentstringBase64 encoded string. When decoded it will contain the byte array containing the audio.
audio_timestampsobjectObject containing phoneme_timestamp information. See section below for further information.
durationfloatThe duration of the produced audio file. Resemble does not bill on this value.
issuesstring[]Any issues pertaining to the synthesis response.
output_formatstringThe output format of the produced audio. Either 'wav', or 'mp3'.
sample_rateintegerThe sample rate of the produced audio. Either 8000, 16000, 22050, 32000, or 44100.
successbooleanTrue if the response was successful, false otherwise.
synth_durationfloatThe duration of the raw audio file produced, before any post processing affects are applied (e.g. the 'prosody' tag which may increase or decrease the duration of the final audio file). Resemble bills on this value.
titlestringThe title of the clip. If no title is provided in the request body, then the value will be null

Audio Timestamps Object

AttributeTypeDescription
graph_timesstringA string containing all the phonemes pertaining to the synthesized audio.
phon_timesfloat[]An array of floats mapping 1 to 1 with the phoneme_chars. Each index represents the end time in the audio of the phoneme character at the same index in the phoneme_chars array.
phon_charschar[]An array of characters mapping 1 to 1 with the end_times array.
graph_charschar[]An array of characters mapping 1 to 1 with the end_times array.