Skip to main content
Version: 2.0.0

NodeJS

Learn how to build your own custom voice using individual recordings through the Resemble AI platform with NodeJS.

note

In order to best use this guide ensure that you have:

  • Signed up for a Resemble.ai account and confirmed your e-mail
  • Obtained your API key for use with the API - for additional information see the Authentication page
  • Have a subscription to at least a PRO on the Resemble platform.

Guide

First, you will need to initialize your project:

# Create node environment
npm init

With your project setup, we can install the Resemble SDK via npm install

# Install Resemble SDK
npm install @resemble/node

Create a new index.js file which will be used for the voice cloning example.

Import the Resemble class from the @resemble/node sdk, file system fs and path path packages - then add the setupResembleAI function to initialize the Resemble API client.

// resemble-clone-voice-recording/index.js

import * as Resemble from '@resemble/node'
import fs from 'fs'
import path from 'path'

const apiKey = process.env.RESEMBLE_API_KEY

if (!apiKey) {
console.error('Please set the RESEMBLE_API_KEY environment variable.')
process.exit(1)
}

const setupResembleAI = (apiKey) => {
console.log('Setting Resemble API Key...')
Resemble.Resemble.setApiKey(apiKey)
}

setupResembleAI(apiKey)

The program expects the environment variable RESEMBLE_API_KEY to be present containing the necessary API key in order to initialize the Resemble AI client.

We want to build our program in way that the user can specify their voice name, and path to the folder containing the recordings using command line arguments. For example:

node index.js <voice_name> <recordings_folder>

To do this, we can use Node’s process.argv property which returns an array containing the command line arguments passed when the Node process was run. Append the following code snippet to your index.js file:

// resemble-clone-voice-recording/index.js
import * as Resemble from '@resemble/node'

// -- snipped --

const args = process.argv.slice(2);

// Check if the required number of arguments is provided
if (args.length !== 2) {
console.error('Usage: node index.js <voice_name> <recordings>');
process.exit(1);
}

const [voiceName, recordings] = args;

// Command-line arguments
const cloneArgs = {
voiceName: voiceName,
recordingsFolder: recordings
};

In the above code snippet, we validate that the provided number of arguments is sufficient ① and proceed to de-structure ② and repackage the arguments for our script use ③.

Building a Voice

There are two options for creating a Voice using the Resemble API:

  1. Providing a dataset URL; or
  2. Uploading individual recordings via the API

In this guide, we will use option #2: uploading individual recordings via the API. If you’re interested in building a voice using option #1, you can see this guide.

A recording is a user-uploaded piece of audio content associated with a voice. Therefore, in order to upload recordings, we will need to create a voice via the API first. Next, we have to upload individual recordings via the Create Recording API and associate them with the voice. Once the recordings are uploaded, we can initiate the voice building operation by using the Voice Build API. At a high level we have the following tasks to do:

  1. Create a new voice and obtain its UUID
  2. Read the audio files from the recordings folder and upload the files and their transcripts using the Recording API attaching them to the Voice
  3. Trigger the Voice Build operation

Let’s create a four skeleton functions to perform these operations:

  • createVoice
  • uploadRecordings
  • readFolder
  • triggerVoiceBuild
// resemble-clone-voice-recording/index.js

import * as Resemble from '@resemble/node'
import fs from 'fs'
import path from 'path'

async function createVoice(voiceName) {}

async function uploadRecordings(voiceUuid, recordingsFolder) {}

async function triggerVoiceBuild(voiceUuid) {}

function readFolder(folderPath) {}

// snipped

Now, let’s create a runExample function which receives our command-line input arguments and expresses how the above functions should operate together:

// resemble-clone-voice-recording/index.js

// -- snipped --

async function runExample(args) {
const [uuid, ok] = await createVoice(args.voiceName);

if (!ok) {
console.log("FAILURE: THe process was aborted because the voice was not created")
throw new Error();
}

await uploadRecordings(
uuid,
args.recordingsFolder
)

let voiceBuildOk = await triggerVoiceBuild(uuid)

if (!voiceBuildOk) {
console.log("Failed to trigger voice build")
}
}

// -- snipped --

const cloneArgs = {
voiceName: voiceName,
recordingsFolder: recordings
};

runExample(cloneArgs)

Creating a Voice

With the scaffolding in place for our functions, let’s fill in the createVoice function with the business logic to call the Voice Creation API. Using the snippet below fill in your createVoice function.

// resemble-clone-voice-recording/main.py

// -- snipped --

async function createVoice(voiceName) {
console.log(`Submitting request to Resemble to create a voice: ${voiceName}`)

try {
// Make request to the API, note that we do not provide a callback_uri so this
// will request will execute synchronously.
//
// This will trigger the voice creation process but not the voice building process
// we need to trigger that through the voice building API
//
// https://docs.app.resemble.ai/docs/create_voices/resource_voice/build/
//
//

let base64Conset = ''

// In order to clone a voice, you MUST provide a base64 encoded consent file
//
// https://docs.app.resemble.ai/docs/create_voices/resource_voice/create#voice-consent
//
// FIXME: You will need update this function to the path to your consent file
const fileContents = fs.readFileSync('FIXME: path/to/consent file', 'binary')

base64Conset = Buffer.from(fileContents).toString('base64')

console.log(`Submitting request to Resemble to create a voice: ${voiceName}`)

// Make a request to the API, note that we do not provide a callback_uri so this
// request will execute synchronously.
const response = await Resemble.Resemble.v2.voices.create({ name: voiceName, consent: base64Conset })

const voice = response.item

if (response.success) {
const voiceStatus = voice.status
const voiceUuid = voice.uuid

console.log(`Response was successful! ${voiceName} has been created with UUID ${voiceUuid}. The voice is currently ${voiceStatus}.`)

return [voiceUuid, true]
} else {
console.log('Response was unsuccessful!')
// In case of an error, print the error to console
console.log(response)

return [undefined, false]
}
} catch (error) {
console.error('An error occurred:', error)

return [undefined, false]
}
}

// -- snipped --

In commitment to Resemble's Ethical Statement all voice clone requests must be authorized via a consent file. You can read more about this in the Voice Consent section. In the above snippet, the code reads the provided consent file, encodes to base64 and provides in the request.

From the above snippet, the function is making a request using the Resemble SDK to create a new voice. If the operation is successful, it will return and voice_uuid for use later - if it doesn’t, we’ll know about it!

Uploading Recordings

With createVoice filled in, we can work on the second step:

Read the audio files from the recordings folder and uploading recordings.

The wisdom of divide and conquer tactics dictates that we ought to break this down into two responsibilities:

  • readFolder responsible for building a list of metadata and file information for upload; and
  • uploadRecordings which makes a request to the API to upload individual recordings created by readFolder

Let’s begin by implementing the readFolder function. For the purposes of this example, we will impose a requirement that the recording data we want to upload is contained in a local folder structured in the following format:

$ tree

example-data/
├── wav-1.txt
├── wav-1.wav
├── wav-2.txt
└── wav-2.wav

... additional files

Where each numbered .wav file represent the audio content for upload and each .txt file is the corresponding audio file’s transcript in plain text. We will then provide the path to this folder as an argument via the recordings command line argument mentioned above.

NOTE: For the example to function properly, we will need 20 recordings and transcripts!

Fill in the readFolder function definition using the snippet below:

// resemble-clone-voice-recording/index.js

// -- snipped --

function readFolder(folderPath) {
const dataList = []

// Iterate through files in the folder
fs.readdirSync(folderPath).forEach((filename) => {
// For each wav file
if (filename.endsWith('.wav')) {
// Check if there is a corresponding .txt file
const txtFilename = filename.replace('.wav', '.txt')
const txtFilePath = path.join(folderPath, txtFilename)

// If the pair exists, use it
if (fs.existsSync(txtFilePath)) {
// Read the text content from the .txt file
const textContent = fs.readFileSync(txtFilePath, 'utf-8')

// Create a dictionary and append to the list
const fileDict = {
file: path.join(folderPath, filename),
text: textContent,
recordingName: txtFilename,
}

dataList.push(fileDict)
} else {
console.warn(`WARN: Unable to find corresponding transcript txt file for ${filename} - SKIPPING`)
}
}
})

return dataList
}

// -- snipped --

The above function will iterate through each file contained in the directory and sort through the audio (i.e., .wav ) files and transcript (i.e., .txt) files reading and building a dictionary for each recording ①, that is later appended to a list that is returned. To use this Create Recording API we require the following attributes for each recording:

fileFileRecording audio file (Ensure that the audio file is not silent and has a duration ranging from 1 to 12 seconds)
emotionstringEmotion of the recording
is_activebooleanIf false, the recording is not used to train the voice model
namestringName of the recording
textstringTranscript of the recording

To simplify our process, we will assume the emotion is neutral and set the is_active flag to true. Using this information let’s implement the uploadRecordings function using the snippet below:

// resemble-clone-voice-recording/index.js

// -- snipped--

async function uploadRecordings(voiceUuid, recordingsFolder) {
console.log(`Beginning recording upload process from folder: ${recordingsFolder}`)

let dataList = readFolder(recordingsFolder)

let failures = 0
let success = 0

for (let num in dataList) {
let recording = dataList[num]

console.log(`Attempting to upload recording: ${recording.recordingName}`)

const file = fs.createReadStream(recording.file)
const fileSize = fs.statSync(recording.file).size

let response = await Resemble.Resemble.v2.recordings.create(
voiceUuid,
{
emotion: 'neutral',
is_active: true,
name: recording.recordingName,
text: recording.text,
},
file,
fileSize,
)

let item = response.item

if (response.success) {
let uuid = item.uuid
console.log(`Request to create recording ${recording.recordingName} was successful! Recording uuid is ${uuid}`)

success += 1
} else {
console.log(`Request to create recording ${recording.recordingName} FAILED!`)
console.log(response)

failures += 1
}
}

console.log(`Recording upload completed, finished uploading ${success} successful and ${failures} failures`)
}
// -- snipped --

The above snippet calls the readFolder function ① that we previously wrote and captures the list of recordings parsed from the folder (i.e., dataList); it then iterates through each recording and uses the Resemble SDK to create a recording via the API using the information contained in each item ②. Since we’re uploading multiple recordings, we keep a success and failures counter and note when a recording succeeds or fails to upload.

Building the Voice

Now that the program can read and upload recording data, the Voice is ready for the last stage of initiating the build process. Fill in the triggerVoiceBuild function using the snippet below:

// resemble-clone-voice-recording/index.js

// -- snipped --

async function triggerVoiceBuild(voiceUuid) {
let response = await Resemble.Resemble.v2.voices.build(voiceUuid)

if (response.success) {
console.log(`Request to initiate voice build for voice ${voice_uuid} was successful!`)
return true
} else {
console.log(`Request to initiate voice build for voice ${voiceUuid} was NOT successful! Response was: `)
console.log(response)

return false
}
}

// -- snipped --

Thankfully, the final act for this play is quite simple. The above snippet just calls the Resemble SDK with the voice UUID to indicate the voice is ready for building. In the case of any failures, the program will be notified.


With all the components in place, go ahead and initiate the program to test:

RESEMBLE_API_KEY=... node index.js "New Voice" "./example-data"

Submitting request to Resemble to create a voice: My Voice

Response was successful! My Voice has been created with UUID 7efec80b. The voice is currently initializing.
Beginning recording upload process from folder: ./example-data

Request to create recording wav-1.txt was successful! Recording uuid is ....
Request to create recording wav-2.txt was successful! Recording uuid is ....

... et cetera

Recording upload completed, finished uploading 20 successful and 0 failures
Request to initiate voice build for voice 7efec80b was successful!

Congratulations, you have successfully created a Custom Voice using the Recording and Voice Building API you can see the progress using the API or use the web application.