Skip to main content
Version: 2.0.0

Model Versions

Background

Resemble’s artificial intelligence and machine learning research team is continually making state-of-the-art improvements with new techniques for voice cloning, audio synthesis and voice conversion.

To ensure access to the latest and most powerful models, Resemble’s platform provides multiple generations of model versions. The purpose of this document is to provide a high level overview of the model versions available on the platform, their associated feature scope and availability within customer plans.

Models

Text To Speech

Version NameVersion CodeDescriptionDataset RequirementsStreaming SupportRelease Date
Resemble Legacy TTStts-legacyResemble’s initial TTS model offering a balance of speed and quality.1+ minutesYes ✅Q2 2021
Resemble Enhanced TTS V1tts-v1Resemble’s first generation of enhanced text-to-speech offering industry state-of-the-art naturalness.10+ minutesNo 🚫Q2 2023
Resemble Enhanced TTS V2tts-v2Resemble’s second generation of enhanced text-to-speech offering state-of-the-art naturalness and lower latency for time-to-first sound.30+ minutesYes ✅Q3 2023
Resemble Enhanced TTS v3tts-v3Resemble’s latest offering of text-to-speech providing an exceptional balance low latency while avoiding compromise in state-of-the-art naturalness of cloned voices.10+ minutesYes ✅Q4 2023

The table below provides detailed breakdown of performance statistics and limitations associated with the models.

Version NameVersion CodeLatency / TTFS*Character LimitsLimitations
Resemble Legacy TTStts-legacy170msMaximum 3000 characters.N/A
Resemble Enhanced TTS V1tts-v12000-3000msMaximum 280 characters.SSML Tags Not Supported: <prosody>, <emotion>, <phonemes>, <substitutions>, <emphasis>, <say-as>. Timestamps not supported. Resemble Fill not supported.
Resemble Enhanced TTS V2tts-v2580msMaximum 1000 characters.SSML Tags Not Supported: <prosody>, <emotion>, <phonemes>, <substitutions>, <emphasis>, <say-as>. Timestamps not supported. Resemble Fill not supported.
Resemble Enhanced TTS v3tts-v3350msMaximum 1000 characters.SSML Tags Not Supported: <prosody>, <emotion>, <phonemes>, <substitutions>, <emphasis>, <say-as>. Timestamps not supported. Resemble Fill not supported.
info

* Time-to-first-sound - the metrics reported are best case scenario, various factors can affect end user latency such as: load times, cold boot, network latency, and more

Speech To Speech

Version NameVersion CodeDescriptionDataset RequirementsStreaming supportRelease Date
Resemble Legacy STSsts-legacyResemble’s initial speech-to-speech model offering providing users the ability to convert speaker audio from one voice to another.10+ MinutesYes ✅Q2 2021
Resemble Core STS V1sts-v1Resemble’s first generation of core speech-to-speech functionality offering state-of-the-art speaker audio conversion with greater speed and accuracy.10+ MinutesYes ✅Q2 2023
Resemble Core STS V2sts-v2Resemble’s second generation of core speech-to-speech functionality offering all the benefits of sts-v1 with improved pitch tracking and 48kHz support.10+ MinutesYes ✅Q4 2023

Resemble Fill

Version NameVersion CodeDescriptionDataset RequirementsRelease Date
Resemble Fill (Audio Inpainting)fill-v1Resemble’s flag ship audio inpainting model allowing users to inpaint audio recordings with novel audio.10+ minutes (initial TTS model training)Q2 2021