NodeJS
Learn how to build your own custom voice using individual recordings through the Resemble AI platform with NodeJS.
In order to best use this guide ensure that you have:
- Signed up for a Resemble.ai account and confirmed your e-mail
- Obtained your API key for use with the API - for additional information see the Authentication page
- Have a subscription to at least a PRO on the Resemble platform.
Guide
First, you will need to initialize your project:
# Create node environment
npm init
With your project setup, we can install the Resemble SDK via npm install
# Install Resemble SDK
npm install @resemble/node
Create a new index.js
file which will be used for the voice cloning example.
Import the Resemble
class from the @resemble/node
sdk, file system fs
and path path
packages - then add the setupResembleAI
function to initialize the Resemble API client.
// resemble-clone-voice-recording/index.js
import * as Resemble from '@resemble/node'
import fs from 'fs'
import path from 'path'
const apiKey = process.env.RESEMBLE_API_KEY
if (!apiKey) {
console.error('Please set the RESEMBLE_API_KEY environment variable.')
process.exit(1)
}
const setupResembleAI = (apiKey) => {
console.log('Setting Resemble API Key...')
Resemble.Resemble.setApiKey(apiKey)
}
setupResembleAI(apiKey)
The program expects the environment variable RESEMBLE_API_KEY
to be present containing the necessary API key in order to initialize the Resemble AI client.
We want to build our program in way that the user can specify their voice name, and path to the folder containing the recordings using command line arguments. For example:
node index.js <voice_name> <recordings_folder>
To do this, we can use Node’s process.argv
property which returns an array containing the command line arguments passed when the Node process was run. Append the following code snippet to your index.js
file:
// resemble-clone-voice-recording/index.js
import * as Resemble from '@resemble/node'
// -- snipped --
const args = process.argv.slice(2);
// Check if the required number of arguments is provided
if (args.length !== 2) { ①
console.error('Usage: node index.js <voice_name> <recordings>');
process.exit(1);
}
const [voiceName, recordings] = args; ②
// Command-line arguments
const cloneArgs = { ③
voiceName: voiceName,
recordingsFolder: recordings
};
In the above code snippet, we validate that the provided number of arguments is sufficient ① and proceed to de-structure ② and repackage the arguments for our script use ③.
Building a Voice
There are two options for creating a Voice using the Resemble API:
- Providing a dataset URL; or
- Uploading individual recordings via the API
In this guide, we will use option #2: uploading individual recordings via the API. If you’re interested in building a voice using option #1, you can see this guide.
A recording is a user-uploaded piece of audio content associated with a voice. Therefore, in order to upload recordings, we will need to create a voice via the API first. Next, we have to upload individual recordings via the Create Recording API and associate them with the voice. Once the recordings are uploaded, we can initiate the voice building operation by using the Voice Build API. At a high level we have the following tasks to do:
- Create a new voice and obtain its UUID
- Read the audio files from the recordings folder and upload the files and their transcripts using the Recording API attaching them to the Voice
- Trigger the Voice Build operation
Let’s create a four skeleton functions to perform these operations:
createVoice
uploadRecordings
readFolder
triggerVoiceBuild
// resemble-clone-voice-recording/index.js
import * as Resemble from '@resemble/node'
import fs from 'fs'
import path from 'path'
async function createVoice(voiceName) {}
async function uploadRecordings(voiceUuid, recordingsFolder) {}
async function triggerVoiceBuild(voiceUuid) {}
function readFolder(folderPath) {}
// snipped
Now, let’s create a runExample
function which receives our command-line input arguments and expresses how the above functions should operate together:
// resemble-clone-voice-recording/index.js
// -- snipped --
async function runExample(args) {
const [uuid, ok] = await createVoice(args.voiceName);
if (!ok) {
console.log("FAILURE: THe process was aborted because the voice was not created")
throw new Error();
}
await uploadRecordings(
uuid,
args.recordingsFolder
)
let voiceBuildOk = await triggerVoiceBuild(uuid)
if (!voiceBuildOk) {
console.log("Failed to trigger voice build")
}
}
// -- snipped --
const cloneArgs = { ①
voiceName: voiceName,
recordingsFolder: recordings
};
runExample(cloneArgs)
Creating a Voice
With the scaffolding in place for our functions, let’s fill in the createVoice
function with the business logic to call the Voice Creation API. Using the snippet below fill in your createVoice
function.
// resemble-clone-voice-recording/main.py
// -- snipped --
async function createVoice(voiceName) {
console.log(`Submitting request to Resemble to create a voice: ${voiceName}`)
try {
// Make request to the API, note that we do not provide a callback_uri so this
// will request will execute synchronously.
//
// This will trigger the voice creation process but not the voice building process
// we need to trigger that through the voice building API
//
// https://docs.app.resemble.ai/docs/create_voices/resource_voice/build/
//
//
let base64Conset = ''
// In order to clone a voice, you MUST provide a base64 encoded consent file
//
// https://docs.app.resemble.ai/docs/create_voices/resource_voice/create#voice-consent
//
// FIXME: You will need update this function to the path to your consent file
const fileContents = fs.readFileSync('FIXME: path/to/consent file', 'binary')
base64Conset = Buffer.from(fileContents).toString('base64')
console.log(`Submitting request to Resemble to create a voice: ${voiceName}`)
// Make a request to the API, note that we do not provide a callback_uri so this
// request will execute synchronously.
const response = await Resemble.Resemble.v2.voices.create({ name: voiceName, consent: base64Conset })
const voice = response.item
if (response.success) {
const voiceStatus = voice.status
const voiceUuid = voice.uuid
console.log(`Response was successful! ${voiceName} has been created with UUID ${voiceUuid}. The voice is currently ${voiceStatus}.`)
return [voiceUuid, true]
} else {
console.log('Response was unsuccessful!')
// In case of an error, print the error to console
console.log(response)
return [undefined, false]
}
} catch (error) {
console.error('An error occurred:', error)
return [undefined, false]
}
}
// -- snipped --
In commitment to Resemble's Ethical Statement all voice clone requests must be authorized via a consent file. You can read more about this in the Voice Consent section. In the above snippet, the code reads the provided consent file, encodes to base64 and provides in the request.
From the above snippet, the function is making a request using the Resemble SDK to create a new voice. If the operation is successful, it will return and voice_uuid
for use later - if it doesn’t, we’ll know about it!
Uploading Recordings
With createVoice
filled in, we can work on the second step:
Read the audio files from the recordings folder and uploading recordings.
The wisdom of divide and conquer tactics dictates that we ought to break this down into two responsibilities:
readFolder
responsible for building a list of metadata and file information for upload; anduploadRecordings
which makes a request to the API to upload individual recordings created byreadFolder
Let’s begin by implementing the readFolder
function. For the purposes of this example, we will impose a requirement that the recording data we want to upload is contained in a local folder structured in the following format:
$ tree
example-data/
├── wav-1.txt
├── wav-1.wav
├── wav-2.txt
└── wav-2.wav
... additional files
Where each numbered .wav
file represent the audio content for upload and each .txt
file is the corresponding audio file’s transcript in plain text. We will then provide the path to this folder as an argument via the recordings
command line argument mentioned above.
NOTE: For the example to function properly, we will need 20 recordings and transcripts!
Fill in the readFolder
function definition using the snippet below:
// resemble-clone-voice-recording/index.js
// -- snipped --
function readFolder(folderPath) {
const dataList = []
// Iterate through files in the folder
fs.readdirSync(folderPath).forEach((filename) => {
// For each wav file
if (filename.endsWith('.wav')) {
// Check if there is a corresponding .txt file
const txtFilename = filename.replace('.wav', '.txt')
const txtFilePath = path.join(folderPath, txtFilename)
// If the pair exists, use it
if (fs.existsSync(txtFilePath)) {
// Read the text content from the .txt file
const textContent = fs.readFileSync(txtFilePath, 'utf-8')
// Create a dictionary and append to the list
const fileDict = {
file: path.join(folderPath, filename),
text: textContent,
recordingName: txtFilename,
}
dataList.push(fileDict)
} else {
console.warn(`WARN: Unable to find corresponding transcript txt file for ${filename} - SKIPPING`)
}
}
})
return dataList
}
// -- snipped --
The above function will iterate through each file contained in the directory and sort through the audio (i.e., .wav
) files and transcript (i.e., .txt
) files reading and building a dictionary for each recording ①, that is later appended to a list that is returned. To use this Create Recording API we require the following attributes for each recording:
file | File | Recording audio file (Ensure that the audio file is not silent and has a duration ranging from 1 to 12 seconds) |
---|---|---|
emotion | string | Emotion of the recording |
is_active | boolean | If false, the recording is not used to train the voice model |
name | string | Name of the recording |
text | string | Transcript of the recording |
To simplify our process, we will assume the emotion
is neutral
and set the is_active
flag to true
. Using this information let’s implement the uploadRecordings
function using the snippet below:
// resemble-clone-voice-recording/index.js
// -- snipped--
async function uploadRecordings(voiceUuid, recordingsFolder) {
console.log(`Beginning recording upload process from folder: ${recordingsFolder}`)
let dataList = readFolder(recordingsFolder)
let failures = 0
let success = 0
for (let num in dataList) {
let recording = dataList[num]
console.log(`Attempting to upload recording: ${recording.recordingName}`)
const file = fs.createReadStream(recording.file)
const fileSize = fs.statSync(recording.file).size
let response = await Resemble.Resemble.v2.recordings.create(
voiceUuid,
{
emotion: 'neutral',
is_active: true,
name: recording.recordingName,
text: recording.text,
},
file,
fileSize,
)
let item = response.item
if (response.success) {
let uuid = item.uuid
console.log(`Request to create recording ${recording.recordingName} was successful! Recording uuid is ${uuid}`)
success += 1
} else {
console.log(`Request to create recording ${recording.recordingName} FAILED!`)
console.log(response)
failures += 1
}
}
console.log(`Recording upload completed, finished uploading ${success} successful and ${failures} failures`)
}
// -- snipped --
The above snippet calls the readFolder
function ① that we previously wrote and captures the list of recordings parsed from the folder (i.e., dataList
); it then iterates through each recording and uses the Resemble SDK to create a recording via the API using the information contained in each item ②. Since we’re uploading multiple recordings, we keep a success
and failures
counter and note when a recording succeeds or fails to upload.
Building the Voice
Now that the program can read and upload recording data, the Voice is ready for the last stage of initiating the build process. Fill in the triggerVoiceBuild
function using the snippet below:
// resemble-clone-voice-recording/index.js
// -- snipped --
async function triggerVoiceBuild(voiceUuid) {
let response = await Resemble.Resemble.v2.voices.build(voiceUuid)
if (response.success) {
console.log(`Request to initiate voice build for voice ${voice_uuid} was successful!`)
return true
} else {
console.log(`Request to initiate voice build for voice ${voiceUuid} was NOT successful! Response was: `)
console.log(response)
return false
}
}
// -- snipped --
Thankfully, the final act for this play is quite simple. The above snippet just calls the Resemble SDK with the voice UUID to indicate the voice is ready for building. In the case of any failures, the program will be notified.
With all the components in place, go ahead and initiate the program to test:
RESEMBLE_API_KEY=... node index.js "New Voice" "./example-data"
Submitting request to Resemble to create a voice: My Voice
Response was successful! My Voice has been created with UUID 7efec80b. The voice is currently initializing.
Beginning recording upload process from folder: ./example-data
Request to create recording wav-1.txt was successful! Recording uuid is ....
Request to create recording wav-2.txt was successful! Recording uuid is ....
... et cetera
Recording upload completed, finished uploading 20 successful and 0 failures
Request to initiate voice build for voice 7efec80b was successful!
Congratulations, you have successfully created a Custom Voice using the Recording and Voice Building API you can see the progress using the API or use the web application.