Skip to main content

Python

Learn how to build your own custom voice using individual recordings through the Resemble AI platform with Python.

note

In order to best use this guide ensure that you have:

  • Signed up for a Resemble.ai account and confirmed your e-mail
  • Obtained your API key for use with the API - for additional information see the Authentication page
  • Have a subscription to at least a PRO on the Resemble platform.

Python

First, you will need to initialize your Python project, it’s virtual environment.

# Create virtual environment
python3 -m venv venv

# Activate virtual environment
source venv/bin/activate

With your project setup, we can install the Resemble Python SDK via pip install and freeze our dependency list for later use:

# Install Resemble python SDK
pip install resemble

# Freeze dependencies
pip freeze > requirements.txt

Create a new main.py file which will be used to house our voice cloning example.

We will initialize the script by creating a basic run_example function responsible for running our process, fill in your main.py with the code snippet below.

# resemble-clone-voice-recording/main.py

# This is the main function that contains the example
arguments = {}

if __name__ == '__main__':
run_example(arguments)

Import the Resemble class from the resemble sdk, base64, argparse, and os modules from Python’s standard library and add a setup function to initialize the API client called initialize_resemble_client:

# resemble-clone-voice-recording/main.py

from resemble import Resemble
import os
import base64
import argparse

# This function sets up the Resemble Python SDK
def initialize_resemble_client():
try:
# Attempt to retrieve the value of the environment variable
resemble_api_key = os.environ['RESEMBLE_API_KEY']

Resemble.api_key(resemble_api_key)
except KeyError:
# If the environment variable is not found, raise an error
raise EnvironmentError(f"The 'RESEMBLE_API_KEY' environment variable is not set.")

# This is the main function that contains the example
def run_example(arguments):
# TODO: Fill me in!

# -- snipped --

The program expects the environment variable RESEMBLE_API_KEY to be present containing the necessary API key in order to initialize the Resemble AI python client.

We want to build our program in way that the user can specify their voice name, and path to the folder containing the recordings using command line options. For example:

RESEMBLE_API_KEY=... python main.py --name <name> --recordings path/to/folder

To do this, we can use Python’s argparse library. Just above the script entry point add the following parsing logic:

# resemble-clone-voice-recording/main.py

from resemble import Resemble
import os
import base64
import argparse

# -- snipped --

def run_example(arguments):
# TODO: Fill me in!

# Create an argument parser
parser = argparse.ArgumentParser(description="A script builds a voice by uploading a set of recordings")

parser.add_argument("--name", required=True, help="The name of the voice to create")
parser.add_argument("--recordings", required=True, help="A path to the folder of recordings in the correct format")

# Parse the command-line arguments
args = parser.parse_args()

# Create a dictionary of arguments
arguments = {
"voice_name": args.name,
"recordings_folder": args.recordings,
}

if __name__ == '__main__':
run_example(arguments)

In the above code snippet, using the argparse library we specify the program arguments and provide them as parameters to the run_example function.

Building a Voice

There are two options for creating a Voice using the Resemble API:

  1. Providing a dataset URL; or
  2. Uploading individual recordings via the API

In this guide, we will use option #2: uploading individual recordings via the API. If you’re interested in building a voice using option #1, you can see this guide.

A recording is a user-uploaded piece of audio content associated with a voice. Therefore, in order to upload recordings, we will need to create a voice via the API first. Next, we have to upload individual recordings via the Create Recording API and associate them with the voice. Once the recordings are uploaded, we can initiate the voice building operation by using the Voice Build API. At a high level we have the following tasks to accomplish:

  1. Create a new voice and obtain its UUID
  2. Read the audio files from the recordings folder and upload the files and their transcripts using the Recording API attaching them to the Voice
  3. Trigger the Voice Build operation

Let’s create a four skeleton functions to perform these operations:

  • create_voice
  • uploading_recordings
  • read_folder
  • trigger_voice_build
# resemble-clone-voice-recording/main.py

from resemble import Resemble
import os
import base64
import argparse

# Create a new voice using the provided name
def create_voice(voice_name):
pass

# Upload recordings in the recording folder and attach to provided voice uuid
def upload_recordings(voice_uuid: str, recordings_folder:str):
pass

# Initiate a voice build for the provided UUID
def trigger_voice_build(voice_uuid: str):
pass

# Read files from the folder path and build a list
def read_folder(folder_path):
pass

# -- snipped --

if __name__ == '__main__':
run_example(arguments)

With some of the skeleton functions in place we can fill in our run_example function to understand how these functions will be used.


# -- snipped --

def run_example(arguments):
# Initialize the client using the environment variable RESEMBLE_API_KEY set
initialize_resemble_client()

# Run the voice creation function to call the Resemble API
uuid = create_voice(voice_name=arguments['voice_name'])

if uuid is None:
print("FAILURE: The process was aborted because the voice was not created")
exit(1)

upload_recordings(voice_uuid=uuid, recordings_folder=arguments['recordings_folder'])

trigger_voice_build(voice_uuid=uuid)

# -- snipped --

Creating a Voice

With the scaffolding in place for our functions, let’s fill in the create_voice function with the business logic to call the Voice Creation API. Using the snippet below fill in your create_voice function.

# resemble-clone-voice-recording/main.py

# -- snipped --

def create_voice(voice_name):
print(f"Submitting request to Resemble to create a voice: {voice_name}")

# Make request to the API, note that we do not provide a callback_uri so this
# will request will execute synchronously.
#
# This will trigger the voice creation process but not the voice building process
# we need to trigger that through the voice building API
#
# https://docs.app.resemble.ai/docs/create_voices/resource_voice/build/
#

base64_consent = ''

# In order to clone a voice, you MUST provide a base64 encoded consent file
#
# https://docs.app.resemble.ai/docs/create_voices/resource_voice/create#voice-consent
#
# FIXME: You will need update this function to the path to your consent file

with open('FIXME: path/to/consent/file', 'rb') as file:
file_contents = file.read()

# Encode the file contents as Base64
base64_consent = base64.b64encode(file_contents).decode('utf-8')

response = Resemble.v2.voices.create(name=voice_name, consent=base64_consent)

voice = response['item']

if response['success']:
voice = response['item']
voice_status = voice['status']
voice_uuid = voice['uuid']

print(f"Response was successful! {voice_name} has been created with UUID {voice_uuid}. The voice is currently {voice_status}.")

return voice_uuid
else:
print("Response was unsuccessful!")

# In case of an error, print the error to STDOUT
print(response)

return None

# -- snipped --

In commitment to Resemble's Ethical Statement all voice clone requests must be authorized via a consent file. You can read more about this in the Voice Consent section. In the above snippet, the code reads the provided consent file, encodes to base64 and provides in the request.

From the above snippet, the function is making a request using the Resemble python SDK to create a new voice. If the operation is successful, it will return and voice_uuid for use later - if it doesn’t, we’ll know about it!

Uploading Recordings

With create_voice filled in, we can work on the second step:

Read the audio files from the recordings folder and uploading recordings.

The wisdom of divide and conquer tactics dictates that we ought to break this down into two responsibilities:

  • read_folder responsible for building a list of metadata and file information for upload; and
  • upload_recordings which makes a request to the API to upload individual recordings created by read_folder

Let’s begin by implementing the read_folder function. For the purposes of this example, we will impose a requirement that the recording data we want to upload is contained in a local folder structured in the following format:

$ tree

example-data/
├── wav-1.txt
├── wav-1.wav
├── wav-2.txt
└── wav-2.wav

... additional files

Where each numbered .wav file represent the audio content for upload and each .txt file is the corresponding audio file’s transcript in plain text. We will then provide the path to this folder as an argument via the --recordings option mentioned above.

NOTE: For the example to function properly, we will need 20 recordings and transcripts.

Fill in the read_folder function definition using the snippet below:

# resemble-clone-voice-recording/main.py

# -- snipped --
def read_folder(folder_path):
data_list = []

# Iterate through files in the folder
for filename in os.listdir(folder_path):
# for each wav file
if filename.endswith(".wav"):
# Check if there is a corresponding .txt file
txt_filename = filename.replace(".wav", ".txt")
txt_filepath = os.path.join(folder_path, txt_filename)

# if the pair exists use it
if os.path.exists(txt_filepath):
# Read the text content from the .txt file
with open(txt_filepath, 'r') as txt_file:
text_content = txt_file.read()

# Create a dictionary and append to the list
file_dict = {
'file': os.path.join(folder_path, filename),
'text': text_content,
'recording_name': txt_filename
}

data_list.append(file_dict)
else:
print(f"WARN: Unable to find corresponding transcript txt file for {filename} - SKIPPING")

return data_list

# -- snipped --

The above function will iterate through each file contained int he directory and sort through the audio (i.e., .wav ) files and transcript (i.e., .txt) files reading and building a dictionary for each recording ①, that is later appended to a list that is returned. To use this Create Recording API we require the following attributes for each recording:

fileFileRecording audio file (Ensure that the audio file is not silent and has a duration ranging from 1 to 12 seconds)
emotionstringEmotion of the recording
is_activebooleanIf false, the recording is not used to train the voice model
namestringName of the recording
textstringTranscript of the recording

To simplify our process, we will assume the emotion is neutral and set the is_active flag to true. Using this information let’s implement the upload_recordings function using the snippet below:

# resemble-clone-voice-recording/main.py

# -- snipped --
def upload_recordings(voice_uuid: str, recordings_folder:str):
print(f"Beginning recording upload process from folder: {recordings_folder}")

data_list = read_folder(recordings_folder)

failures = 0
success = 0

for recording in data_list:
response = Resemble.v2.recordings.create(
voice_uuid,
open(recording['file'],'rb'),
recording['recording_name'],
recording['text'],
is_active=True,
emotion="neutral"
)

if response['success']:
uuid = response['item']['uuid']

print(f"Request to create recording {recording['recording_name']} was successful! Recording uuid is {uuid}")
success+= 1
else:
print(f"Request to create recording {recording['recording_name']} was NOT successful!")
print(response)
failures+= 1

print(f"Recording upload completed, finished uploading {success} successful and {failures} failures")
# -- snipped --

The above snippet calls the read_folder function ① that we wrote previously and captures the list of recordings parsed from the folder (i.e., data_list); it then iterates through each recording and uses the Resemble SDK to create a recording via the API using the information contained in each item ②. Since we’re uploading multiple recordings, we keep a success and failures counter and note when a recording succeeds or fails to upload.

Building the Voice

Now that the program can read and upload recording data, the Voice is ready for the last stage of initiating the build process. Fill in the trigger_voice_build function using the snippet below:

# resemble-clone-voice-recording/main.py

# -- snipped --
def trigger_voice_build(voice_uuid: str):
response = Resemble.v2.voices.build(uuid=voice_uuid)

if response['success']:
print(f"Request to initiate voice build for voice {voice_uuid} was successful!")
return True
else:
print(f"Request to initiate voice build for voice {voice_uuid} was NOT successful! Response was: ")

print(response)

return False

# -- snipped --

Thankfully, the final act for this play is quite simple. The above snippet just calls the Resemble SDK with the voice UUID to indicate the voice is ready for building. In the case of any failures, the program will be notified.


With all the components in place, go ahead and initiate the program to test:

RESEMBLE_API_KEY=... python3 main.py "New Voice" --recordings "./example-data"

Submitting request to Resemble to create a voice: My Voice

Response was successful! My Voice has been created with UUID 7efec80b. The voice is currently initializing.
Beginning recording upload process from folder: ./example-data

Request to create recording wav-1.txt was successful! Recording uuid is ....
Request to create recording wav-2.txt was successful! Recording uuid is ....

... et cetera

Recording upload completed, finished uploading 20 successful and 0 failures
Request to initiate voice build for voice 7efec80b was successful!

Congratulations, you have successfully created a Custom Voice using the Recording and Voice Building API you can see the progress using the API or use the web application.