Text-To-Speech
Overview
Transform any text into natural-sounding speech using Resemble AI's advanced text-to-speech technology. Our API supports multiple synthesis methods optimized for different use cases, from immediate audio generation to real-time streaming applications.
Synthesis Methods
The Resemble AI Text-to-Speech API offers multiple synthesis methods to fit different use cases and performance requirements:
Synchronous Text-to-Speech
Perfect for shorter text inputs where you need the complete audio file immediately. The API processes your entire text and returns the full audio in a single response.
Best for:
- Voice messages and alerts
- Short audio clips
- Applications that need complete audio before proceeding
- Simple integrations
Streaming (HTTP)
Stream audio data as it's generated, reducing latency and enabling real-time playback. Audio is generated sequentially and sent in chunks, allowing you to start playing audio before synthesis is complete.
Best for:
- Longer text content
- Real-time applications
- Reduced perceived latency
- Progressive audio playback
Streaming (WebSocket)
The lowest latency option using WebSocket connections for real-time audio streaming. Ideal for interactive applications requiring immediate audio feedback.
Best for:
- Interactive voice applications
- Real-time conversations
- Live audio generation
- Applications requiring minimal latency
Note: Available for Business Plan users and higher.
Getting Started
- Choose your method based on your use case above
- Get your API token from the Resemble AI dashboard
- Follow the specific guide for your chosen synthesis method
- Integrate the API into your application
Ready to start? Pick the synthesis method that best fits your needs!