#
RVC
RVC stands for Realtime Voice Cloning. This technique allows transferring voice features from one audio clip to another, essentially making it speak in a different voice.
Have you ever seen popular "Presidents Play X" videos? Yes, that was RVC too. You can make your SillyTavern characters speak any voice (anime/movie/your own) using the RVC Extra module.
#
RVC Setup
PREREQUISITES:
sillytavern-extras
: Switch to theneo
branchsillytavern
: Switch to thestaging
branch
- In a file browser navigate to \SillyTavern-extras\data\models\rvc, create a subfolder and put .pth and .index into the created folder
- Install requirements with:
pip install -r requirements-rvc.txt
- Run SillyTavern-extras with an RVC module enabled (add other models and parameters if needed):
python server.py --enable-modules=rvc
- In SillyTavern, go to Extensions --> RVC and enable it
- Setup a Voice map for RVC: read the instructions for TTS above, but instead of TTS voices, use RVC model folder names
- Select pitch extraction: rmvpe
- Go to Extensions --> TTS and enable it
- Select TTS Provider: Coqui or your other preferred provider, but not System - RVC doesn't work with it. ElevenLabs is not recommended since it has its own voice cloning capabilities
- Enable Auto Generation
#
Train your own RVC model
#
RVC Easy Menu by Deffcolony (only for Windows).
Automatically install and launch Mangio-RVC: https://github.com/deffcolony/rvc-easy-menu
- Choose a location where you want to git clone the repo because RVC will install in the same location where you launch the script:
git clone https://github.com/deffcolony/rvc-easy-menu.git
- Open RVC-Launcher.bat
- Choose 1 since you want to install RVC.
- When 7-zip pops up just, click install because it's a requirement for the 7z package that will get extracted automatically.
- After installation when the menu returns choose 2 to open WebUI for Voice Training.
#
Mangio-RVC - Train a voice model
Dataset preparation:
- Put the voice you want to train in the
datasets
folder. - Make sure there is NO BACKGROUND NOISE in the audio file; only raw voice!
- The output quality will be better the longer the audio.
In the WebUI:
- Click on the training tab
- Enter the experiment name, for example,
my-epic-voice-model
- Set version to v2
- Click on "Process data"
- Click on "Feature extraction"
- Set "Save frequency" to 50
- Set "Total training epochs" to 300
- Click on "Train feature index"
- Click on "Train model"