Easy local voice cloning in Python with Coqui-TTS

Here we are going to add voice cloning to our python based application using the Coqui-TTSs project. First we will need to download the files from github.

git clone https://github.com/idiap/coqui-ai-TTS

This is a fork of the original project that will work with newer versions of python. Now move into the coqui-ai-TTS directory to create our virtual environment and install.

cd coqui-ai-TTS
#create our virtual environment named .venv
python -m venv .venv
source .venv/bin/activate
#install the files 
pip install -e .

Once complete Get a voice! Use your own voice sample or download from a number of sources, https://movie-sounds.org is always a good place for old kung fu movie clips.

Now we can get to writing our python. I will be taking a line from Winnie the pooh to test. Create a new file and add

import torch
from TTS.api import TTS

# Get device
device = "cuda" if torch.cuda.is_available() else "cpu"

# Init TTS
tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to(device)

tts.tts_to_file(text="""""Hello there! Here is Edward Bear, coming downstairs now, bump bump, bump 
                on the back of his head, behind Christopher Robin. It is, as far as he knows the only way of coming downstairs, but sometimes he feels that there really is another way, if only he could stop bumping for a moment and think of it. And then he feels that perhaps there isn't. Anyhow, here he is at the bottom, and ready to be introduced to you, 'Winnie-the-Pooh'. 
                When I first heard his name I said, just as you are going to say ""But I thought he was a boy?""
                ""So did I,"" said Christopher Robin.
                ""Then you can't call him Winnie?"".
                ""I don't."". ""But you said——"".
                ""He's Winnie-ther-Pooh. Don't you know what 'ther' means?""""",  language="en", speaker_wav="his-martial-art-skill-is-actually-beyond-any-of-you.wav", file_path="output2.wav")

We initialize the device with if cuda is available, This should work with both ROCm, or cuda as ROCm interprets cuda and set tts to use it.

tts.tts_to_file will export to our output file at file_path using our speaker_wav file as the source voice.