Version: 2.0.0

SSML Reference

You can use Speech Synthesis Markup Language (your SSML) as input to control how Resemble generates speech. Resemble automatically handles normal punctation, such as pausing after a period, or speaking a sentence that ends with a question mark as a question. However, in some cases, you may want additional control of Resemble's synthetic speech. This may include, for example, having certain words pronounced in a specific way, saying a word or sentence with excitement, spelling certain words character by character, and much more.

SSML is a markup language that provides a standard way to markup text for the generation of synthetic speech. The specific tags Resemble supports are listed in Supported SSML Tags.

Supported SSML tags

These are the SSML elements that Resemble supports. The speak element is required. All other elements are optional.

SSML Element	Required	Summary
speak	Yes	Required root element for the SSML document.
prosody	No	Specifics the pitch, volume, and rate of a word.
emphasis	No	Apply a pre-defined emphasis on a word. Emphasis is a pre-set combination of pitch and volume.
say-as	No	Indicates the type of text contained in the element. For example, acronym.
sub	No	Specified the string of text to pronounce rather than the text contained in the element.
break	No	Inserts a pause in between words.
language	No	Specifies the language to generate the content.
audio	No	Allows for the insertion of recorded audio files in addition with synthesized speech output.
resemble:convert	No	Applies speech-to-speech given a source audio file.

`<speak>`: Speak tag

The required root element of the SSML document.

Syntax

<speak prompt="string" version="float" xmlns="string" xml:lang="string"></speak>

Attributes

Attribute	Required	Description
version	No	Indicates the version of the SSMl specification used to interpret the document markup. Defaults to v1.1.
xml:lang	No	Specifies the language of the root language. The value may contain a lowercase, two-letter language code (for example, en), or the language code and uppercase country/region (for example, en-US). Defaults to en-us.
xmlns	No	Specified the URI to the document that defines the markup vocabulary. The current URI is http://www.w3.org/TR/2010/REC-speech-synthesis11-20100907/synthesis.xsd
temperature	No	This is controls the randomness of the generated output and the value ranges from 0.1 to 5. The default value is 0.8.
exaggeration	No	This is controls the intensity of emotion and the value ranges from 0.0 to 1.0.
seed	No	A non-negative integer. Initializes the model for deterministic output.
prompt	No	Specifies the prompt for the voice to use. Eg. `Say this in an angry tone`.

`<prosody>` tag

An optional tag used to style the way synthesized speech sounds by specifying the pitch, rate, or volume.

Example

The sample has been generated using this input:

<speak>This part is normal. <prosody pitch="x-high">This part is going to sound high pitched</prosody>. <prosody rate="150%">This part is going to be spoken fast</prosody>. <prosody volume="loud">And this part is loud</prosody>!<speak>

Syntax

<prosody pitch="string" rate="string" volume="string"></prosody>

Attributes

Attribute	Required	Description
pitch	No	The baseline pitch of the synthesized speech. This must be one of the following values: x-Low low medium high x-high
rate	No	The baseline speed of the synthesized speech. The rate must be a percent value. For example 100% is normal, 50% is half as fast as normal, 200% is double the speed of normal.
volume	No	Indicates the volume level of the synthesized speech. This must be one of the following values: silent x-soft soft medium loud x-loud

`<emphasis>` tag

An optional tag that specifies the emphasis of the synthesized speech. Emphasis makes it easier to apply a pre-defined range of volume & pitch to the synthesized speech.

Example

The sample has been generated using this input:

<speak><emphasis level="reduced">I am more of a shy person really</emphasis>.</speak>

Syntax

<emphasis level="string"></emphasis>

Attributes

Attribute	Required	Description
level	No	Specifies the emphasis to apply on the text within the emphasis tag. The following are the possible level's that you may specify: reduced strong

`<say-as>` tag

An optional element that indicates the content type. This provides guidance to the speech synthesis AI about how to pronounce the text.

Example

The following sample has been generated using this input:

<speak>This <say-as interpret-as="characters">SSML</say-as> stuff is really cool!</speak>

Syntax

<say-as interpret-as="string"></say-as>

Attributes

Attributes	Required	Description
interpret-as	Yes	Indicates the content type of element's text. The only types that are currently support are: characters The characters content type will spell out each character of the contained text.

`<sub>` tag

An optional element that specifies a string of text that is pronounced in place of the element's text.

Example

The following sample has been generated using this input:

<speak>Hi <sub alias="Joe">Jim</sub>, we are calling today to inform you of your account activation with Resemble.</speak>

Syntax

<sub alias="string"></sub>

Attributes

Attribute	Required	Description
alias	Yes	Specifies the substitute text to speak.

`<break>` tag

An optional tag used to insert pauses between words.

Example

The following sample has been generated using this input:

<speak>This is going to be a long <break time="2s"/>pause.</speak>

Syntax

<break time="string" />

Attributes

Attribute	Required	Description
time	Yes	Specifies the absolute duration of a pause in seconds. For example, 1s.

`<language>` tag

If supported by the voice, this tag will be able to switch languages.

Example

The following sample has been generated using this input:

<speak>Su vuelo a <lang xml:lang="en-us">Pearson International Airport</lang> partirá en 30 minutos.</speak>

Syntax

<lang xml:lang="string" />

Attributes

Attribute	Required	Description
xml:lang	Yes	Specifies the language that the text should generate in. Supported languages vary by voice.

Supported Languages

Click to view all supported languages

Language	xml:lang code
Afrikaans - South Africa	af-za
Amharic - Ethiopia	am-et
Arabic - United Arab Emirates	ar-ae
Arabic - Egypt	ar-eg
Arabic - Iraq	ar-iq
Arabic - Kuwait	ar-kw
Arabic - Morocco	ar-ma
Arabic - Qatar	ar-qa
Arabic - Saudi Arabia	ar-sa
Azerbaijani - Azerbaijan	az-az
Bulgarian - Bulgaria	bg-bg
Bengali - Bangladesh	bn-bd
Bengali - India	bn-in
Bosnian - Bosnia	bs-ba
Catalan - Spain	ca-es
Mandarin - China	cmn-cn
Czech - Czech Republic	cs-cz
Danish - Denmark	da-dk
German - Germany	de-de
Greek - Greece	el-gr
English - Australia	en-au
English - Canada	en-ca
English - United Kingdom	en-gb
English - Hong Kong	en-hk
English - Ireland	en-ie
English - India	en-in
English - Kenya	en-ke
English - New Zealand	en-nz
English - Singapore	en-sg
English - United States	en-us
English - South Africa	en-za
Spanish - Argentina	es-ar
Spanish - Chile	es-cl
Spanish - Colombia	es-co
Spanish - Costa Rica	es-cr
Spanish - Cuba	es-cu
Spanish - Dominican Republic	es-do
Spanish - Ecuador	es-ec
Spanish - Spain	es-es
Spanish - Mexico	es-mx
Spanish - Peru	es-pe
Spanish - Puerto Rico	es-pr
Spanish - Paraguay	es-py
Spanish - United States	es-us
Spanish - Venezuela	es-ve
Estonian - Estonia	et-ee
Basque - Spain	eu-es
Persian - Iran	fa-ir
Finnish - Finland	fi-fi
Filipino - Philippines	fil-ph
French - Belgium	fr-be
French - Canada	fr-ca
French - Switzerland	fr-ch
French - France	fr-fr
Irish - Ireland	ga-ie
Gujarati - India	gu-in

Language	xml:lang code
Hebrew - Israel	he-il
Hindi - India	hi-in
Croatian - Croatia	hr-hr
Hungarian - Hungary	hu-hu
Armenian - Armenia	hy-am
Indonesian - Indonesia	id-id
Icelandic - Iceland	is-is
Italian - Italy	it-it
Japanese - Japan	ja-jp
Javanese - Indonesia	jv-id
Kazakh - Kazakhstan	kk-kz
Khmer - Cambodia	km-kh
Kannada - India	kn-in
Korean - South Korea	ko-kr
Lithuanian - Lithuania	lt-lt
Latvian - Latvia	lv-lv
Malayalam - India	ml-in
Mongolian - Mongolia	mn-mn
Marathi - India	mr-in
Malay - Malaysia	ms-my
Maltese - Malta	mt-mt
Burmese - Myanmar	my-mm
Norwegian - Norway	nb-no
Nepali - Nepal	ne-np
Dutch - Belgium	nl-be
Dutch - Netherlands	nl-nl
Punjabi - India	pa-in
Polish - Poland	pl-pl
Pashto - Afghanistan	ps-af
Portuguese - Brazil	pt-br
Portuguese - Portugal	pt-pt
Romanian - Romania	ro-ro
Russian - Russia	ru-ru
Sinhala - Sri Lanka	si-lk
Slovak - Slovakia	sk-sk
Slovenian - Slovenia	sl-si
Somali - Somalia	so-so
Albanian - Albania	sq-al
Serbian - Serbia	sr-rs
Swedish - Sweden	sv-se
Swahili - Kenya	sw-ke
Tamil - India	ta-in
Tamil - Sri Lanka	ta-lk
Tamil - Malaysia	ta-my
Telugu - India	te-in
Thai - Thailand	th-th
Turkish - Turkey	tr-tr
Ukrainian - Ukraine	uk-ua
Urdu - Pakistan	ur-pk
Vietnamese - Vietnam	vi-vn
Chinese - China	zh-cn
Chinese - Hong Kong	zh-hk
Chinese - Taiwan	zh-tw
Chinese - Mandarin	yue-cn
Zulu - South Africa	zu-za

`<resemble:convert>` tag

If supported by the voice, this tag will be able to perform speech-to-speech.

Speech-to-Speech enables you to transform a recording of one speaker, into a recording of another speaker. Similar to other Resemble features, you can take advantage of speech-to-speech in your application through SSML and our API.

⚠️ The maximum allowed file size is 50mb, and the maximum duration is 300 seconds. If a file exceeds any of these parameters, it will automatically be trimmed.

Example

The following sample has been generated using this input:

<speak><resemble:convert src="https://resemble-data.s3.us-east-2.amazonaws.com/source-s2s.wav"/></speak>

Syntax

<resemble:convert src="string" pitch="float"></resemble:convert>

Attributes

Attribute	Required	Description
src	Yes	A direct URL to the source audio file. Only WAV files are supported. The source audio file should be a recording of a single speaker.
pitch	No	Adjust the pitch of the generated audio using a float value between -10.0 and 10.0. If the pitch value is set to 0 or not provided, the audio will be generated with no pitch adjustment.

`<resemble:fill>` tag

Resemble Fill enables you to take existing recordings of speech and modify them seamlessly (audio inpainting).

⚠️ Please see Using Resemble Fill Through The API for detailed instructions.

Example

The following sample has been generated using this input:

<resemble:fill recording_uuid="06ba9935">The Finch gladly accepted the invitation and arrived in good time and with a very good appetite.</resemble:fill>

Syntax

<resemble:fill recording_uuid="<string>"></resemble:fill>

Attributes

Attribute	Required	Description
recording_uuid	Yes	The recording UUID of the recording you want to modify. See Using Resemble Fill Through The API for detailed instructions.

`<audio>` tag

The <audio> tag allows you to insert recorded audio files in addition with the synthesized speech output.

Example

The following sample has been generated using this input:

<audio src="angry_cow.mp3">
  <desc>An angry cow</desc>
  Moo!!! (The sound failed to load)
</audio>

Syntax

<audio
    src="<string>"
    soundLevel="<string>"
    background="<boolean>">
    Hello there
></audio>

Attributes

Attribute	Required	Description
`src`	Yes	A URI referring to an audio source. You must use `wav`
`soundLevel`	No	Change the volume level of the audio, specified in percentage
`background`	No	Play an audio file in the background of a spoken text or inline. For example, playing music in the background of a spoken text prompt.

SSML Reference

Supported SSML tags​

<speak>: Speak tag​

Attributes​

<prosody> tag​

Example​

Attributes​

<emphasis> tag​

Example​

Attributes​

<say-as> tag​

Example​

Attributes​

<sub> tag​

Example​

Attributes​

<break> tag​

Example​

Attributes​

<language> tag​

Example​

Attributes​

Supported Languages​

<resemble:convert> tag​

Example​

Attributes​

<resemble:fill> tag​

Example​

Attributes​

<audio> tag​

Example​

Attributes​

Supported SSML tags

`<speak>`: Speak tag

Attributes

`<prosody>` tag

Example

Attributes

`<emphasis>` tag

Example

Attributes

`<say-as>` tag

Example

Attributes

`<sub>` tag

Example

Attributes

`<break>` tag

Example

Attributes

`<language>` tag

Example

Attributes

Supported Languages

`<resemble:convert>` tag

Example

Attributes

`<resemble:fill>` tag

Example

Attributes

`<audio>` tag

Example

Attributes