Skip to main content
Version: 2.0.0

Text-To-Speech

Overview

Transform any text into natural-sounding speech using Resemble AI's advanced text-to-speech technology. Our API supports multiple synthesis methods optimized for different use cases, from immediate audio generation to real-time streaming applications.

Synthesis Methods

The Resemble AI Text-to-Speech API offers multiple synthesis methods to fit different use cases and performance requirements:

Synchronous Text-to-Speech

Perfect for shorter text inputs where you need the complete audio file immediately. The API processes your entire text and returns the full audio in a single response.

Best for:

  • Voice messages and alerts
  • Short audio clips
  • Applications that need complete audio before proceeding
  • Simple integrations

Streaming (HTTP)

Stream audio data as it's generated, reducing latency and enabling real-time playback. Audio is generated sequentially and sent in chunks, allowing you to start playing audio before synthesis is complete.

Best for:

  • Longer text content
  • Real-time applications
  • Reduced perceived latency
  • Progressive audio playback

Streaming (WebSocket)

The lowest latency option using WebSocket connections for real-time audio streaming. Ideal for interactive applications requiring immediate audio feedback.

Best for:

  • Interactive voice applications
  • Real-time conversations
  • Live audio generation
  • Applications requiring minimal latency

Note: Available for Business Plan users and higher.

Getting Started

  1. Choose your method based on your use case above
  2. Get your API token from the Resemble AI dashboard
  3. Follow the specific guide for your chosen synthesis method
  4. Integrate the API into your application

Ready to start? Pick the synthesis method that best fits your needs!