Create Your Own Voice Clone!
Welcome to this beginner-friendly tutorial on creating a voice cloning pipeline using Tortoise TTS and RVC AI. In this guide, we will walk you through the process of setting up a simplepod.ai cloud instance to train your own voice model. Voice cloning technology has advanced significantly, allowing us to create highly accurate and personalized voice models. By combining Tortoise TTS and RVC AI, you can achieve impressive results in voice synthesis.
What You Will Learn
- Understanding Tortoise TTS and RVC AI: Tortoise TTS is an open-source text-to-speech tool that allows you to generate speech from text. RVC AI, on the other hand, is a tool that enhances the voice cloning process by refining the audio output. Together, they form a powerful pipeline for voice cloning.
- Setting Up a Cloud Instance: We will guide you through the steps to set up a cloud environment where you can run the necessary tools and scripts. This setup is crucial for handling the computational requirements of training a voice model.
- Training Your Voice Model: Learn how to gather and prepare audio samples for training. We will cover best practices for recording and processing audio to ensure high-quality input for your model.
Key Points to Consider
- Quality of Input Data: The quality of your voice model heavily depends on the input data. Ensure that your audio samples are clear, free of background noise, and recorded at the correct sampling rate.
- Cloud vs. Local Setup: While this tutorial focuses on using a cloud instance, you can also run the tools locally if your hardware supports it. Each setup has its pros and cons.
- Ethical Use: Voice cloning technology should be used responsibly. Always ensure you have permission to use the voice data you are working with, and be mindful of privacy and ethical considerations.
Getting Started
Before diving into the technical details, make sure you have the following prerequisites:
- While not required, a basic understanding of Python and cloud computing would be very helpful. I will try to guide you as best I can but the process can be nuanced.
- Access to Simplepod.AI account
- A collection of audio samples for the voice you wish to clone.
Step 1. Login to Simplepod.ai and click “find instances”.
Step 2. Select a single RTX 4090. Note: for now stick with a single GPU. Multi GPU training is called distributed data processing or D.D.P for short. This can become extremely complex and in some cases is not supported. You can circle back to learn D.D.P at a later time but ultimately is beyond the scope of this tutorial.

Step 3. The docker templates page will pop open and from here scroll down until you find “simplepodai/jarodmica-ai-voice-cloning”.
NOTE: Be aware that the version numbers may change at a later date. This is fine, as long as it says “jarodmica/ai-voice-cloning” you’re doing the right thing.

Step 4. This will open up a side panel to your right. Verify your settings match mine. You shouldn’t have to change anything.

Step 5. Click “Save & Use” then click “Run”


You will get a quick popup. Click it to go to your instance. If you’re not fast enough you can click “My Instances” on the left side panel to get to the same place.

Now you have to wait for the instance to load the docker image. It will look something like this:

Step 6. When the instance has loaded up you’ll start to see buttons appear. These are different ways to access the data and scripts within the instance. In this case, lets open the WebUI by clicking on the Port 7860 button.

You are now within the instance and can begin training your model and or running inference.
For more specifics on how to train a model take a look at Jarod Mica’s repo where he goes over the whole process in detail
https://github.com/JarodMica/ai-voice-cloning
and in two videos he made:
https://youtu.be/WWhNqJEmF9M?si=RhUZhYersAvSZ4wf
https://www.youtube.com/watch?v=7tpWH8_S8es&t=504s